Last year, Nuno Maduro and I put together an RFC for combining the multi-line capabilities of long-closures with the auto-capture compactness of short-closures. That RFC didn't fully go to completion due to concerns over the performance impact, which Nuno and I didn't have bandwidth to resolve.
Arnaud Le Blanc has now picked up the flag with an improved implementation that includes benchmarks showing an effectively net-zero performance impact, aka, good news as it avoids over-capturing.
The RFC has therefore been overhauled accordingly and is now ready for consideration.
https://wiki.php.net/rfc/auto-capture-closure
--
Larry Garfield
larry@garfieldtech.com
Hey Larry,
Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures. That RFC didn't fully go to completion due to concerns
over the performance impact, which Nuno and I didn't have bandwidth to
resolve.Arnaud Le Blanc has now picked up the flag with an improved implementation
that includes benchmarks showing an effectively net-zero performance
impact, aka, good news as it avoids over-capturing.The RFC has therefore been overhauled accordingly and is now ready for
consideration.
Couple questions:
nesting these functions within each other
What happens when/if we nest these functions? Take this minimal example:
$a = 'hello world';
(fn () {
(fn () {
echo $a;
})();
})();
capturing $this
In the past (also present), I had to type static fn () => ...
or static function () { ...
all over the place, to avoid implicitly binding $this
to a closure, causing hidden memory leaks.
Assuming following:
- these new closures could capture
$this
automatically, once detected - these new closures can optimize away unnecessary variables that aren't
captured
Would that allow us to get rid of static fn () {
declarations, when
creating one of these closures in an instance method context?
Greets,
Marco Pivetta
Hi,
On jeudi 9 juin 2022 18:46:53 CEST Marco Pivetta wrote:
nesting these functions within each other
What happens when/if we nest these functions? Take this minimal example:
$a = 'hello world'; (fn () { (fn () { echo $a; })(); })();
Capture bubbles up. When an inner function uses a variable, the outer function
in fact uses it too, so it's captured by both functions, by-value.
This example prints "hello world": The inner function captures $a from the
outer function, which captures $a from its declaring scope.
This is equivalent to
(function () use ($a) {
(function () use ($a) {
echo $a;
})();
})();
capturing
$this
In the past (also present), I had to type
static fn () => ...
orstatic function () { ...
all over the place, to avoid implicitly binding$this
to a closure, causing hidden memory leaks.Assuming following:
- these new closures could capture
$this
automatically, once detected- these new closures can optimize away unnecessary variables that aren't
capturedWould that allow us to get rid of
static fn () {
declarations, when
creating one of these closures in an instance method context?
It would be great to get rid of this, but ideally this would apply to Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.
--
Arnaud Le Blanc
Hi,
On jeudi 9 juin 2022 18:46:53 CEST Marco Pivetta wrote:
nesting these functions within each other
What happens when/if we nest these functions? Take this minimal example:
$a = 'hello world'; (fn () { (fn () { echo $a; })(); })();
Capture bubbles up. When an inner function uses a variable, the outer
function
in fact uses it too, so it's captured by both functions, by-value.This example prints "hello world": The inner function captures $a from the
outer function, which captures $a from its declaring scope.This is equivalent to
(function () use ($a) { (function () use ($a) { echo $a; })(); })();
capturing
$this
In the past (also present), I had to type
static fn () => ...
or
static function () { ...
all over the place, to avoid implicitly binding
$this
to a closure, causing hidden memory leaks.Assuming following:
- these new closures could capture
$this
automatically, once detected- these new closures can optimize away unnecessary variables that aren't
capturedWould that allow us to get rid of
static fn () {
declarations, when
creating one of these closures in an instance method context?It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.
I've tried this in the past, and this is not possible due to implicit $this
uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.
Regards,
Nikita
On Thu, Jun 9, 2022 at 8:15 PM Arnaud Le Blanc arnaud.lb@gmail.com
wrote:Hi,
On jeudi 9 juin 2022 18:46:53 CEST Marco Pivetta wrote:
nesting these functions within each other
What happens when/if we nest these functions? Take this minimal example:
$a = 'hello world'; (fn () { (fn () { echo $a; })(); })();
Capture bubbles up. When an inner function uses a variable, the outer
function
in fact uses it too, so it's captured by both functions, by-value.This example prints "hello world": The inner function captures $a from
the
outer function, which captures $a from its declaring scope.This is equivalent to
(function () use ($a) { (function () use ($a) { echo $a; })(); })();
capturing
$this
In the past (also present), I had to type
static fn () => ...
or
static function () { ...
all over the place, to avoid implicitly binding
$this
to a closure, causing hidden memory leaks.Assuming following:
- these new closures could capture
$this
automatically, once detected- these new closures can optimize away unnecessary variables that
aren't
capturedWould that allow us to get rid of
static fn () {
declarations, when
creating one of these closures in an instance method context?It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.I've tried this in the past, and this is not possible due to implicit
$this uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.Regards,
Nikita
Hey Nikita,
Do you have another example? Calling instance methods statically is...
well... deserving a hard crash :|
Marco Pivetta
On Thu, Jun 9, 2022 at 8:15 PM Arnaud Le Blanc arnaud.lb@gmail.com
wrote:Hi,
On jeudi 9 juin 2022 18:46:53 CEST Marco Pivetta wrote:
nesting these functions within each other
What happens when/if we nest these functions? Take this minimal
example:$a = 'hello world'; (fn () { (fn () { echo $a; })(); })();
Capture bubbles up. When an inner function uses a variable, the outer
function
in fact uses it too, so it's captured by both functions, by-value.This example prints "hello world": The inner function captures $a from
the
outer function, which captures $a from its declaring scope.This is equivalent to
(function () use ($a) { (function () use ($a) { echo $a; })(); })();
capturing
$this
In the past (also present), I had to type
static fn () => ...
or
static function () { ...
all over the place, to avoid implicitly binding
$this
to a closure, causing hidden memory leaks.Assuming following:
- these new closures could capture
$this
automatically, once
detected- these new closures can optimize away unnecessary variables that
aren't
capturedWould that allow us to get rid of
static fn () {
declarations, when
creating one of these closures in an instance method context?It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.I've tried this in the past, and this is not possible due to implicit
$this uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.Regards,
NikitaHey Nikita,
Do you have another example? Calling instance methods statically is...
well... deserving a hard crash :|
Maybe easier to understand if you replace Foo::bar() with parent::bar()?
That's the most common spelling for this type of call.
I agree that the syntax we use for this is unfortunate (because it is
syntactically indistinguishable from a static method call, which it is
not), but that's what we have right now, and we can hardly just stop
supporting it.
Regards,
Nikita
Hey Nikita,
On Thu, Jun 9, 2022 at 8:15 PM Arnaud Le Blanc arnaud.lb@gmail.com
wrote:Would that allow us to get rid of
static fn () {
declarations, when
creating one of these closures in an instance method context?It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate RFC.I've tried this in the past, and this is not possible due to implicit
$this uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.Regards,
NikitaHey Nikita,
Do you have another example? Calling instance methods statically is...
well... deserving a hard crash :|Maybe easier to understand if you replace Foo::bar() with parent::bar()?
That's the most common spelling for this type of call.I agree that the syntax we use for this is unfortunate (because it is
syntactically indistinguishable from a static method call, which it is
not), but that's what we have right now, and we can hardly just stop
supporting it.
Dunno, it's a new construct, so perhaps we could do something about it.
I'm not suggesting we change the existing fn
or function
declarations,
but in this case, we're introducing a new construct, and some work already
went in to do the eager discovery of by-val variables.
Heck, variable variables already wouldn't work here, according to this RFC
:D
Marco Pivetta
Hey Nikita,
On Thu, Jun 9, 2022 at 8:15 PM Arnaud Le Blanc arnaud.lb@gmail.com
wrote:Would that allow us to get rid of
static fn () {
declarations, when
creating one of these closures in an instance method context?It would be great to get rid of this, but ideally this would apply to
Arrow
Functions and Anonymous Functions as well. This could be a separate
RFC.I've tried this in the past, and this is not possible due to implicit
$this uses. See
https://wiki.php.net/rfc/arrow_functions_v2#this_binding_and_static_arrow_functions
for a brief note on this. The tl;dr is that if your closure does "fn() =>
Foo::bar()" and Foo happens to be a parent of your current scope and bar()
a non-static method, then this performs a scoped instance call that
inherits $this. Not binding $this here would result in an Error exception,
but the compiler doesn't have any way to know that $this needs to be bound.Regards,
NikitaHey Nikita,
Do you have another example? Calling instance methods statically is...
well... deserving a hard crash :|Maybe easier to understand if you replace Foo::bar() with parent::bar()?
That's the most common spelling for this type of call.I agree that the syntax we use for this is unfortunate (because it is
syntactically indistinguishable from a static method call, which it is
not), but that's what we have right now, and we can hardly just stop
supporting it.Dunno, it's a new construct, so perhaps we could do something about it.
I'm not suggesting we change the existingfn
orfunction
declarations,
but in this case, we're introducing a new construct, and some work already
went in to do the eager discovery of by-val variables.Heck, variable variables already wouldn't work here, according to this RFC
:D
We're not introducing a new construct. We're just extending existing fn()
functions to accept {} blocks, with exactly the same semantics as before. I
would find it highly concerning if fn() => X and fn() => { return X; } had
differences in capture semantics. Those two expressions should be strictly
identical -- the former should be desugared to the latter.
Nikita
Hi Larry,
czw., 9 cze 2022 o 18:34 Larry Garfield larry@garfieldtech.com napisał(a):
Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures. That RFC didn't fully go to completion due to concerns
over the performance impact, which Nuno and I didn't have bandwidth to
resolve.Arnaud Le Blanc has now picked up the flag with an improved implementation
that includes benchmarks showing an effectively net-zero performance
impact, aka, good news as it avoids over-capturing.The RFC has therefore been overhauled accordingly and is now ready for
consideration.
Nice work. Well-described behaviors.
One question, more around future scope or related functionality in the
future:
A future RFC for "short-methods" described here in one of your declined RFC
https://wiki.php.net/rfc/short-functions in the past could be revived with
no conflicts in the scope of methods?
class Foo {
public string $firstName = 'John';
public function getFirstName(): string => $this->firstName;
}
I'm asking if I understand the scopes of this and previous RFCs correctly
and if they don't block in future "short-methods"?
Cheers,
Michał Marcin Brzuchalski
Hi Larry,
czw., 9 cze 2022 o 18:34 Larry Garfield larry@garfieldtech.com napisał(a):
Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures. That RFC didn't fully go to completion due to concerns
over the performance impact, which Nuno and I didn't have bandwidth to
resolve.Arnaud Le Blanc has now picked up the flag with an improved implementation
that includes benchmarks showing an effectively net-zero performance
impact, aka, good news as it avoids over-capturing.The RFC has therefore been overhauled accordingly and is now ready for
consideration.Nice work. Well-described behaviors.
One question, more around future scope or related functionality in the
future:
A future RFC for "short-methods" described here in one of your declined RFC
https://wiki.php.net/rfc/short-functions in the past could be revived with
no conflicts in the scope of methods?class Foo {
public string $firstName = 'John';
public function getFirstName(): string => $this->firstName;
}I'm asking if I understand the scopes of this and previous RFCs correctly
and if they don't block in future "short-methods"?Cheers,
Michał Marcin Brzuchalski
The short-functions RFC is entirely separate. The syntax choices in both that RFC and this one were made to ensure that they don't conflict with each other, and the resulting syntax meaning is consistent across the language. The implementations are independent and should in no way conflict.
(The short-functions RFC would have enabled short-methods too. It was purely a syntax sugar with no additional behavior.)
That's assuming the attitude toward the short-function RFC ever changes enough in the future to make it worth trying again...
--Larry Garfield
Last year, Nuno Maduro and I put together an RFC for combining the multi-line capabilities of long-closures with the auto-capture compactness of short-closures ... Arnaud Le Blanc has now picked up the flag with an improved implementation ... The RFC has therefore been overhauled accordingly and is now ready for consideration.
First of all, thanks to all three of you for the work on this. Although
I'm not quite convinced yet, I know a lot of people have expressed
desire for this feature over the years.
My main concern is summed up accidentally by your choice of subject line
for this thread: is the proposal to add short closure syntax or is it
to add auto-capturing closures?
They may sound like the same thing, but to me "short closure syntax"
(and a lot of the current RFC) implies that the new syntax is better for
nearly all closures, and that once it is introduced, the old syntax
would only really be there for compatibility - similar to how the []
syntax replaces array() and list(). If that is the aim, it's not enough
to assert that "the majority" of closures are very short; the syntax
should stand up even when used for, say, a middleware handler in a
micro-framework. As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.
On the other hand, "auto-capturing" could be seen as a feature in its
own right; something that users will opt into when it makes sense, while
continuing to use explicit capture in others. If that is the aim, the
proposed syntax is decidedly sub-optimal: to a new user, there is no
obvious reason why "fn" and "function" should imply different semantics,
or which one is which. A dedicated syntax such as use(*) or use(...)
would be much clearer. We could even separately propose that "fn" and
"function" be interchangeable everywhere, allowing combinations such as
"fn() use(...) { return $x; }" and "function() => $x;"
To go back to the point about variable scope: right now, if you're in a
function, all variables are scoped to that function. With a tiny handful
of exceptions (e.g. superglobals), access to variables from any other
scope is always explicit - via parameters, "global", "use", "$this", and
so on. If we think that should change, we should make that decision
explicitly, not treat it as a side-effect of syntax.
I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time. And users don't "learn
to recognize" that a loop doesn't hide all variables from the parent
scope; it would be very peculiar if it did.
This is also where comparison to other languages falls down: most
languages which capture implicitly for closures also merge scopes
implicitly at other times - e.g. global variables in functions; instance
properties in methods; or nested block scopes. Generally they also have
a way to opt out of those, and mark a variable as local to a function or
block; PHP does not, because it has always required an opt in.
Which leads me back to my constructive suggestion: let's introduce a
block scoping syntax (e.g. "let $foo;") as a useful feature in its own
right, before we introduce short closures.
As proposed, users will need to have some idea of what "live variable
analysis" means, or add dummy assignments, if they want to be sure a
variable is actually local. With a block scoping keyword, they can mark
local variables explicitly, as they would in other languages.
Regards,
--
Rowan Tommins
[IMSoP]
On Sat, Jun 11, 2022 at 11:14 PM Rowan Tommins rowan.collins@gmail.com
wrote:
Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures ... Arnaud Le Blanc has now picked up the flag with an
improved implementation ... The RFC has therefore been overhauled
accordingly and is now ready for consideration.They may sound like the same thing, but to me "short closure syntax"
(and a lot of the current RFC) implies that the new syntax is better for
nearly all closures, and that once it is introduced, the old syntax
would only really be there for compatibility - similar to how the []
syntax replaces array() and list(). If that is the aim, it's not enough
to assert that "the majority" of closures are very short; the syntax
should stand up even when used for, say, a middleware handler in a
micro-framework. As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.
The RFC does mention that the existing Anonymous Function Syntax remains
untouched and will not be deprecated. Whether the new syntax is better for
nearly all closures will be a personal choice. If the new syntax doesn't
suit, say, a middleware handler, then we still can:
- reach for the old syntax
- use invocable classes
- call another method or function which creates a brand new scope and then
returns a function/callable.
On the other hand, "auto-capturing" could be seen as a feature in its
own right; something that users will opt into when it makes sense, while
continuing to use explicit capture in others. If that is the aim, the
proposed syntax is decidedly sub-optimal: to a new user, there is no
obvious reason why "fn" and "function" should imply different semantics,
or which one is which. A dedicated syntax such as use(*) or use(...)
would be much clearer. We could even separately propose that "fn" and
"function" be interchangeable everywhere, allowing combinations such as
"fn() use(...) { return $x; }" and "function() => $x;"
The previous discussions talked about use() or use(...) and most people I
know that would love this RFC to pass would also dislike that alternative.
It does not have the greatest asset for short closure: aesthetics. Maybe my
personal bubble is not statistically relevant, but this is where PHP
Internals is lacking on surveying actual users of the language to help on
such matters. All I can say is that use() is not a replacement for the RFC.
To go back to the point about variable scope: right now, if you're in a
function, all variables are scoped to that function. With a tiny handful
of exceptions (e.g. superglobals), access to variables from any other
scope is always explicit - via parameters, "global", "use", "$this", and
so on. If we think that should change, we should make that decision
explicitly, not treat it as a side-effect of syntax.
Any attempt to make it explicit defeats the purpose of the RFC. The
auto-capturing means we don't have to write awkward code to access
variables. The only way we have to avoid awkward syntax (such as use
($var1, $var2)) is to declare an entire new invocable class and send the
parameters via the constructor. When many variables are involved, that may
still be a great option, but doing that just for 1 variable and 2 lines is
quite... sad. When I think of new accessors for this particular case, they
would either be innovative or verbose. If they are verbose, we already have
a syntax for that. If they are innovative, it would be an awkward
out-of-place situation that doesn't happen elsewhere in the language. Or I
lack the imagination to see a different result.
Ultimately, I see fn() as "an opt-in to not create a separate scope for a
function". PHP has several language constructs that may or may not create a
separate scope.
Delimite Scope: function, method, class, procedural file
Shared scope: if, for, foreach, include, require and fn
Regards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
--
Marco Aurélio Deleu
On Sat, Jun 11, 2022 at 11:14 PM Rowan Tommins rowan.collins@gmail.com
wrote:Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture compactness
of short-closures ... Arnaud Le Blanc has now picked up the flag with an
improved implementation ... The RFC has therefore been overhauled
accordingly and is now ready for consideration.They may sound like the same thing, but to me "short closure syntax"
(and a lot of the current RFC) implies that the new syntax is better for
nearly all closures, and that once it is introduced, the old syntax
would only really be there for compatibility - similar to how the []
syntax replaces array() and list(). If that is the aim, it's not enough
to assert that "the majority" of closures are very short; the syntax
should stand up even when used for, say, a middleware handler in a
micro-framework. As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.The RFC does mention that the existing Anonymous Function Syntax remains
untouched and will not be deprecated. Whether the new syntax is better for
nearly all closures will be a personal choice. If the new syntax doesn't
suit, say, a middleware handler, then we still can:
- reach for the old syntax
- use invocable classes
- call another method or function which creates a brand new scope and then
returns a function/callable.
Correct. If this RFC passes, there will be three equally supported syntaxes for creating closures:
function ($a) use ($b) {
return $a * $b;
};
fn ($a) => $a * $b;
fn ($a) {
return $a * $b;
};
Which one is appropriate in a given situation is left up to developer judgement.
My own personal position would be
- use fn => where possible
- use fn {} if going mult-line.
- if the body of the closure is more than ~3 lines and is not virtually the entire wrapping scope, it should be its own named function/method anyway, and the new first-class-callable syntax makes that nice and easy to use.
That is, I would probably not use the manual-capture syntax very often at all. However, if someone disagrees with me on case 3 it's still there for them if that's easier in context.
Whether the new syntax is viewed as "adding auto-capture to long closures" or "adding multi-line support to short closures" is, in the end, a mostly academic distinction with no practical difference. The resulting syntax is smack in the middle of the two existing ones. The original RFC from a year ago approached it from the perspective of the first; The rewritten RFC leans on the second perspective. The use of both names is mostly a historical artifact of reusing the old URL. :-) The net result is the same.
--Larry Garfield
The RFC does mention that the existing Anonymous Function Syntax
remains untouched and will not be deprecated. Whether the new syntax
is better for nearly all closures will be a personal choice.
I honestly don't think this is how it will be perceived. If this syntax
is approved, people will see "fn" as the "new, better way" and
"function" as the "old, annoying way".
To put it a different way: imagine we had no closure support at all, and
decided that we needed two flavours, one with explicit capture and one
with implicit capture. Would we choose "function" and "fn" as keywords?
The previous discussions talked about use() or use(...) and most
people I know that would love this RFC to pass would also dislike that
alternative. It does not have the greatest asset for short closure:
aesthetics. [...] All I can say is that use() is not a replacement
for the RFC.
I think you're trying to have it both ways here: if you really believed
that the two syntaxes were going to live side by side, there would be no
reason for "aesthetics" to be any more important for one than the other.
Some people are of the opinion that automatic capture should always have
been the default, and the current syntax is a mistake. I'm fine with
that opinion, but I want people to be honest about it rather than
pretending they're just adding a new option for a narrow use case.
Any attempt to make it explicit defeats the purpose of the RFC.
That depends what you think the purpose of the RFC is, which is what I
want people to be honest about.
If the purpose is to replace long lists of captured variables, an
explicit "capture all" syntax like "use(*)" achieves that purpose
perfectly fine.
Ultimately, I see fn() as "an opt-in to not create a separate scope
for a function".
I disagree with both parts of this:
- I don't think users will see "fn" as an "opt-in", they'll see it as
"the new normal", and "function" as a rare "opt-out" or a "legacy version". - It does still create a separate scope, it just creates a nested
scope, which combines two sets of variables, in a way that PHP currently
never does.
Regards,
--
Rowan Tommins
[IMSoP]
On Sun, Jun 12, 2022 at 2:29 PM Rowan Tommins rowan.collins@gmail.com
wrote:
The RFC does mention that the existing Anonymous Function Syntax
remains untouched and will not be deprecated. Whether the new syntax
is better for nearly all closures will be a personal choice.I honestly don't think this is how it will be perceived. If this syntax
is approved, people will see "fn" as the "new, better way" and
"function" as the "old, annoying way".
And to me that's not an argument to deny what people want.
To put it a different way: imagine we had no closure support at all, and
decided that we needed two flavours, one with explicit capture and one
with implicit capture. Would we choose "function" and "fn" as keywords?
I often don't indulge such hypotheticals because we will never truly be
able to make progress based on such an assumption. A breaking change that
changes how closure works is just not gonna happen. Given the current state
in the world we're in, what can we do to have a better DX on anonymous
functions?
The previous discussions talked about use() or use(...) and most
people I know that would love this RFC to pass would also dislike that
alternative. It does not have the greatest asset for short closure:
aesthetics. [...] All I can say is that use() is not a replacement
for the RFC.I think you're trying to have it both ways here: if you really believed
that the two syntaxes were going to live side by side, there would be no
reason for "aesthetics" to be any more important for one than the other.Some people are of the opinion that automatic capture should always have
been the default, and the current syntax is a mistake. I'm fine with
that opinion, but I want people to be honest about it rather than
pretending they're just adding a new option for a narrow use case.
Honestly I don't think it was a mistake. It was designed more than a decade
ago and there was no way of predicting the future. I've seen code written
20~10 years ago and I've seen code written 5~0 years ago. I think the best
decision was taken at the time it was taken and the world of development
has changed enough for us to make different decisions now.
It's not that I'm trying to have it both ways, I'm just not assuming my
view is the right one. I do believe that if such an RFC is approved, I will
almost never reach for function () use ()
anymore because I will prefer
the short syntax. If I need a new scope I will reach for an invocable
class. But that doesn't mean other teams/projects/people are forced to
agree or follow the same practices as me or my team.
Any attempt to make it explicit defeats the purpose of the RFC.
That depends what you think the purpose of the RFC is, which is what I
want people to be honest about.If the purpose is to replace long lists of captured variables, an
explicit "capture all" syntax like "use(*)" achieves that purpose
perfectly fine.
If someone decides to implement function () use (*)
on a separate RFC, I
would abstain from that because it's not something I'm interested in using
and it doesn't address the aesthetic issue we have today. I just don't like
it being considered an alternative to the current RFC because it's not. The
purpose is to replace long lists of captured variables while addressing the
aesthetic issue caused by use ()
, which is the only place in the language
we use this construct.
It seems to me that you agree that there is a chance the proposed syntax is
going to be perceived as better and people will not want to use the old
syntax anymore and that makes you not want to accept the RFC.
Regards,
--
Rowan Tommins
[IMSoP]--
To unsubscribe, visit: https://www.php.net/unsub.php
--
Marco Aurélio Deleu
I honestly don't think this is how it will be perceived. If this syntax
is approved, people will see "fn" as the "new, better way" and
"function" as the "old, annoying way".And to me that's not an argument to deny what people want.
I never said it was. I said that if that is what we expect, we should design the feature with that in mind, rather than relying on the older syntax as a crutch.
Given the current state in the world we're in, what can we
do to have a better DX on anonymous functions?
I already gave my answer to that: either add implicit capture as an opt-in to the current syntax; or add block scope and treat short closures as consistent with that.
Honestly I don't think it was a mistake. It was designed more than a decade
ago and there was no way of predicting the future. I've seen code written
20~10 years ago and I've seen code written 5~0 years ago. I think the best
decision was taken at the time it was taken and the world of development
has changed enough for us to make different decisions now.
I've seen that argument before, but it's not clear to me that anything has changed. Anonymous functions are used for roughly the same things that always were, so why are the arguments made when they were added, and again when short closures were discussed previously, no longer valid?
If someone decides to implement
function () use (*)
on a separate RFC, I
would abstain from that because it's not something I'm interested in using
That's fair enough. Just remember that that is your opinion of what is important, and others may have different views.
aesthetic issue caused by
use ()
, which is the only place in the language
we use this construct.
That's like saying we only use the word "class" when declaring classes. It has slightly different syntax, but "use" is exactly the same principle as importing variables into scope with "global", or declaring them "static". It's entirely in keeping with how scope works in PHP.
It seems to me that you agree that there is a chance the proposed syntax is
going to be perceived as better and people will not want to use the old
syntax anymore and that makes you not want to accept the RFC.
No, it makes me want to make the new syntax as useful as possible.
Regards,
--
Rowan Tommins
[IMSoP]
On Sun, Jun 12, 2022 at 6:55 PM Rowan Tommins rowan.collins@gmail.com
wrote:
It seems to me that you agree that there is a chance the proposed syntax
is
going to be perceived as better and people will not want to use the old
syntax anymore and that makes you not want to accept the RFC.No, it makes me want to make the new syntax as useful as possible.
On the sentiment, we can agree. It just happens that from where I'm
standing, any change to the proposed syntax will make it less useful.
--
Marco Aurélio Deleu
Last year, Nuno Maduro and I put together an RFC for combining the multi-line capabilities of long-closures with the auto-capture compactness of short-closures ... Arnaud Le Blanc has now picked up the flag with an improved implementation ... The RFC has therefore been overhauled accordingly and is now ready for consideration.
First of all, thanks to all three of you for the work on this. Although
I'm not quite convinced yet, I know a lot of people have expressed
desire for this feature over the years.
To go back to the point about variable scope: right now, if you're in a
function, all variables are scoped to that function. With a tiny handful
of exceptions (e.g. superglobals), access to variables from any other
scope is always explicit - via parameters, "global", "use", "$this", and
so on. If we think that should change, we should make that decision
explicitly, not treat it as a side-effect of syntax.I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time. And users don't "learn
to recognize" that a loop doesn't hide all variables from the parent
scope; it would be very peculiar if it did.
There are languages that do, however. Some languages have block-scoped variables by default (such as Rust), or partially blocked scoped depending on details. PHP is not one of them, but to someone coming from a language that does, PHP's way of doing things is just as weird and requires learning. The point here is that "which things create a scope and which don't" are not "intuitive" in any language. They're always language-idiomatic, and may or may not be internally consistent, which is the important part.
PHP is fairly internally consistent: functions and classes create a scope, nothing else does. This RFC doesn't change that one way or another, so it's not really any harder to learn. Plus, as noted, the fn
keyword becomes consistently the flag saying "auto-capture happens here, FYI", which is already the case as of 7.4.
Which leads me back to my constructive suggestion: let's introduce a
block scoping syntax (e.g. "let $foo;") as a useful feature in its own
right, before we introduce short closures.As proposed, users will need to have some idea of what "live variable
analysis" means, or add dummy assignments, if they want to be sure a
variable is actually local. With a block scoping keyword, they can mark
local variables explicitly, as they would in other languages.
That may be a useful feature on its own, especially for longer loops. I'm definitely open to discussing that. I don't think that is a prerequisite for a nicer lambda syntax, however, as I don't think the confusion potential is anywhere near as large as you apparently fear it is.
--Larry Garfield
Last year, Nuno Maduro and I put together an RFC for combining the multi-line capabilities of long-closures with the auto-capture compactness of short-closures ... Arnaud Le Blanc has now picked up the flag with an improved implementation ... The RFC has therefore been overhauled accordingly and is now ready for consideration.
First of all, thanks to all three of you for the work on this. Although
I'm not quite convinced yet, I know a lot of people have expressed
desire for this feature over the years.To go back to the point about variable scope: right now, if you're in a
function, all variables are scoped to that function. With a tiny handful
of exceptions (e.g. superglobals), access to variables from any other
scope is always explicit - via parameters, "global", "use", "$this", and
so on. If we think that should change, we should make that decision
explicitly, not treat it as a side-effect of syntax.I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time. And users don't "learn
to recognize" that a loop doesn't hide all variables from the parent
scope; it would be very peculiar if it did.There are languages that do, however. Some languages have block-scoped variables by default (such as Rust), or partially blocked scoped depending on details. PHP is not one of them, but to someone coming from a language that does, PHP's way of doing things is just as weird and requires learning.
Working in Go now for several years I'd say one of its biggest foot guns that I consistently run into when doing code reviews and even in my own code is block-level scoping where one variable shadows the same named variable outside the block and the inner variable is updated when the intention was to update the outer variable.
In short, block level scoping is a convenience that does more harm than good. At least in my experience.
Thus I would highly recommend not adding block level variable scope to PHP where a block in the middle of a function shadows a variable outside the block, and that variable is used below the block, such as for a return value.
#jmtcw #fwiw
-Mike
... users don't "learn to recognize" that a loop doesn't hide all variables from the parent
scope; it would be very peculiar if it did.
There are languages that do, however. Some languages have block-scoped variables by default (such as Rust), or partially blocked scoped depending on details.
That's not what the RFC example implies, though; it implies that someone
might expect $guests and $guestsIds to not be usable inside the
foreach loop, because they were declared outside it. I don't know of
any language where entering a loop creates a completely empty symbol
table, do you?
Whether or not $guest, having been declared inside the loop, is
visible after the loop is a completely different question, and one
that doesn't apply to closures - the content of the closure hasn't been
executed yet, so it is inevitably a black box to the code after it.
PHP is fairly internally consistent: functions and classes create a scope, nothing else does. This RFC doesn't change that one way or another...
The RFC fundamentally changes the rule that a function always creates a
new, empty scope. Every variable that is not local to that scope has
to be explicitly imported, one way or another.
Every language I know where scopes do not start out empty has keywords
for marking which variables are definitely local to that block. That's
why I think a "var" or "let" equivalent is a natural accompaniment to
changing PHP's rules in this way.
Plus, as noted, the
fn
keyword becomes consistently the flag saying "auto-capture happens here, FYI", which is already the case as of 7.4.
It's a cute idea, but I don't think "if you miss most of the letters out
of a word, it means this special thing" is at all memorable. I've never
heard the syntax added in 7.4 called "fn functions", but I've frequently
heard it called "arrow functions", because what stands out to people is
the "=>". The keyword is only there because ($a)=>$b on its own would
collide with array syntax.
I would much rather see "fn" and "function" become synonyms, so that
"public fn foo() {}" is a valid method declaration, and "function() =>
$foo" is a valid arrow function.
Regards,
--
Rowan Tommins
[IMSoP]
On samedi 11 juin 2022 23:14:28 CEST Rowan Tommins wrote:
My main concern is summed up accidentally by your choice of subject line
for this thread: is the proposal to add short closure syntax or is it
to add auto-capturing closures?
The proposal is to extend the Arrow Functions syntax so that it allows
multiple statements. I wanted to give a name to the RFC, so that we could
refer to the feature by that name instead of the longer "auto-capture multi-
statement closures". But the auto-capture behavior is an important aspect we
want to inherit from Arrow Functions.
As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.
Currently the use()
syntax co-exists with auto-capture, but we could change
it so that an explicit use()
list disables auto-capture instead:
fn () use ($a) { } // Would capture $a and disable auto-capture
fn () use () { } // Would capture nothing and disable auto-capture
On the other hand, "auto-capturing" could be seen as a feature in its
own right; something that users will opt into when it makes sense, while
continuing to use explicit capture in others. If that is the aim, the
proposed syntax is decidedly sub-optimal: to a new user, there is no
obvious reason why "fn" and "function" should imply different semantics,
or which one is which. A dedicated syntax such as use(*) or use(...)
would be much clearer. We could even separately propose that "fn" and
"function" be interchangeable everywhere, allowing combinations such as
"fn() use(...) { return $x; }" and "function() => $x;"
Unfortunately, Arrow Functions already auto-capture today, so requiring a
use(*)
to enable auto-capture would be a breaking change.
I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time.
Do you have an example where this would be a problem?
This is also where comparison to other languages falls down: most
languages which capture implicitly for closures also merge scopes
implicitly at other times - e.g. global variables in functions; instance
properties in methods; or nested block scopes. Generally they also have
a way to opt out of those, and mark a variable as local to a function or
block; PHP does not, because it has always required an opt in.
These languages capture/inherit in a read-write fashion. Being able to scope a
variable (opt out of capture) is absolutely necessary otherwise there is only
one scope.
In these languages it is easy to accidentally override/bind a variable from
the parent scope by forgetting a variable declaration.
Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.
Which leads me back to my constructive suggestion: let's introduce a
block scoping syntax (e.g. "let $foo;") as a useful feature in its own
right, before we introduce short closures.
I like this, especially if it also allows to specify a type. However, I don't
think it's needed before this RFC.
As proposed, users will need to have some idea of what "live variable
analysis" means, or add dummy assignments, if they want to be sure a
variable is actually local. With a block scoping keyword, they can mark
local variables explicitly, as they would in other languages.
Live-variable analysis is mentioned in as part of implementation details. It
should not be necessary to understand these details to understand the behavior
of auto-capture.
I've updated the "Auto-capture semantics" section of the RFC.
Regards,
Arnaud Le Blanc
The proposal is to extend the Arrow Functions syntax so that it allows
multiple statements.
That's one perspective. The other perspective is that the proposal is to
extend closure syntax to support automatic capture.
Currently the
use()
syntax co-exists with auto-capture, but we could change
it so that an explicituse()
list disables auto-capture instead:fn () use ($a) { } // Would capture $a and disable auto-capture fn () use () { } // Would capture nothing and disable auto-capture
That's an interesting idea. I was coming from the other direction, but
it might make sense I guess.
By the way, the current RFC implies you could write this:
fn() use (&$myRef, $a) { $myRef = $a * $b; }
It's clear that $myRef is captured by reference, and $a by value; but
what about $b? Is it local to the closure as it would be in a "long"
closure, or implicitly captured by value as it would be with no "use"
statement?
It might be best for such mixtures to raise an error.
Unfortunately, Arrow Functions already auto-capture today, so requiring a
use(*)
to enable auto-capture would be a breaking change.
I'm not suggesting any change to arrow functions, just the ability to
write "use(*)" (or "use(...)") in all the place you can write
"use($foo)" today.
I don't think that introduces any problems, if you think of "fn" as an
alternative spelling of "function", and "=>" as expanding to "use(*) {
return"
I don't find the comparison to a foreach loop very convincing. Loops are
still only accessing variables while the function is running, not saving
them to be used at some indeterminate later time.
Do you have an example where this would be a problem?
I didn't say anything was a problem; I just said that the comparison
didn't make sense, because the scenarios are so different.
Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.
Live-variable analysis is mentioned in as part of implementation details. It
should not be necessary to understand these details to understand the behavior
of auto-capture.
As noted in my other e-mail, by-value capture can still have side
effects, so users may still want to ensure that their code is free of
such side effects.
Currently, the only way to do so is to understand the "implementation
details" of which variables will be captured, and perhaps add dummy
statements like "$foo = null;" or "unset($foo);" to make sure of it.
Regards,
--
Rowan Tommins
[IMSoP]
The proposal is to extend the Arrow Functions syntax so that it allows
multiple statements.That's one perspective. The other perspective is that the proposal is to
extend closure syntax to support automatic capture.
As noted before, this is a distinction without a difference. The proposed syntax brings in one aspect of short-closures and one aspect of long-closures. Which you consider it "coming from" as a starting point is, in practice, irrelevant.
By the way, the current RFC implies you could write this:
fn() use (&$myRef, $a) { $myRef = $a * $b; }
It's clear that $myRef is captured by reference, and $a by value; but
what about $b? Is it local to the closure as it would be in a "long"
closure, or implicitly captured by value as it would be with no "use"
statement?It might be best for such mixtures to raise an error.
The RFC already covers that. $b will be auto-captured by value from scope if it exists. See the "Explicit capture" section and its example.
Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.Live-variable analysis is mentioned in as part of implementation details. It
should not be necessary to understand these details to understand the behavior
of auto-capture.As noted in my other e-mail, by-value capture can still have side
effects, so users may still want to ensure that their code is free of
such side effects.Currently, the only way to do so is to understand the "implementation
details" of which variables will be captured, and perhaps add dummy
statements like "$foo = null;" or "unset($foo);" to make sure of it.
There's two different issues you're raising here that almost seem to be contradictory.
- Auto-capture could still over-capture without people realizing it. Whether this is actually an issue in practice (or would be) is hard to say with certainty; I'm not sure if it's possible to make an educated guess based on a top-1000 analysis, so we're all trying to predict the future. Note, however, that this risk is already present for short-closures, as the capture logic is the same.
Arguably it's less of an issue only because short-closures are, well, short, so less likely to reuse variables unintentionally. However, based on my top-1000 survey, even today the vast majority of long-closures are only 2-4 lines long. I don't believe that makes it 2-4 times more likely, as it's still trivial for a developer to look at a 2 line closure and say "oh, I'm reusing that variable name, maybe that's not as clear as it could be."
- The syntactic indicator that "auto capture will happen". The RFC says "fn". You're recommending "use(*)". However, changing the indicator syntax would do nothing to improve point 1. It's just a longer indicator to use the same logic, especially as it would also require the full "function" word. (Making fn and function synonyms sounds like it would have a lot more knock-on effects that feel very out of scope at present.)
--Larry Garfield
That's one perspective. The other perspective is that the proposal is to
extend closure syntax to support automatic capture.
As noted before, this is a distinction without a difference.
It's a difference in focus which is very evident in some of the comments
on this thread. For instance, Arnaud assumed that adding "use(*)" would
require a change to arrow functions, whereas that never even occurred to
me, because we're looking at the feature through a different lens.
By the way, the current RFC implies you could write this:
fn() use (&$myRef, $a) { $myRef = $a * $b; }
The RFC already covers that. $b will be auto-captured by value from scope if it exists. See the "Explicit capture" section and its example.
So it does. I find that extremely confusing; I think it would be clearer
to error for that case, changing the proposal to:
Short Closures support explicit by-reference capture with the |use|
keyword. Combining a short closure with explicit by-value capture
produces an error.
And the example to:
$a = 1;
fn () use (&$b) {
return $a + $b; // $a is auto-captured by value
// $b is explicitly captured by reference
}
Clearer syntax for this has been cited previously as an advantage of
use(*) or use(...):
$a = 1;
function () use (&$b, ...) { // read as "use $b by reference, everything
else by value"
return $a + $b;
}
- Auto-capture could still over-capture without people realizing it. Whether this is actually an issue in practice (or would be) is hard to say with certainty; I'm not sure if it's possible to make an educated guess based on a top-1000 analysis, so we're all trying to predict the future.
I tried to make very explicit what I was and was not disputing:
Whether the risk of these side effects is a big problem is up for
debate, but it's wrong to suggest they don't exist.
The RFC seems to be implying that the implementation removes the side
effects, but it does not, it is users paying attention to their code
which will remove the side effects.
Arguably it's less of an issue only because short-closures are, well, short, so less likely to reuse variables unintentionally.
Our current short closures aren't just a single statement, they're a
single expression, and that's a really significant difference, because
it means to all intents and purposes they have no local scope. (You
can create and use a local variable within one expression, but it
requires the kind of twisted code that only happens in code golf.)
If there are no local variables, there is nothing to be accidentally
captured. That's why the current implementation doesn't bother
optimising which variables it captures - it's pretty safe to assume that
all variables in the expression are either parameters or captured.
- The syntactic indicator that "auto capture will happen". The RFC says "fn". You're recommending "use(*)". However, changing the indicator syntax would do nothing to improve point 1.
The reason I think it would be better is because it is a more
intentional syntax: the author of the code is more likely to think
"I'm using an auto-capture closure, rather than an explicit-capture
closure, what effect will that have?" and readers of the code are more
likely to think "hm, this is using auto-capture, I wonder which
variables are local, and which are captured?"
Of course they can still guess wrong, but I don't think "fn" vs
"function" is a strong enough clue.
(Making fn and function synonyms sounds like it would have a lot more knock-on effects that feel very out of scope at present.)
Off the top of my head, I can't think of any, but I admit I haven't
tried hacking it into the parser to see if anything explodes.
Regards,
--
Rowan Tommins
[IMSoP]
That's one perspective. The other perspective is that the proposal is to
extend closure syntax to support automatic capture.
As noted before, this is a distinction without a difference.It's a difference in focus which is very evident in some of the comments on this thread. For instance, Arnaud assumed that adding "use(*)" would require a change to arrow functions, whereas that never even occurred to me, because we're looking at the feature through a different lens.
By the way, the current RFC implies you could write this:
fn() use (&$myRef, $a) { $myRef = $a * $b; }
The RFC already covers that. $b will be auto-captured by value from scope if it exists. See the "Explicit capture" section and its example.So it does. I find that extremely confusing; I think it would be clearer to error for that case, changing the proposal to:
Short Closures support explicit by-reference capture with the |use| keyword. Combining a short closure with explicit by-value capture produces an error.
And the example to:
$a = 1;
fn () use (&$b) {
return $a + $b; // $a is auto-captured by value
// $b is explicitly captured by reference
}Clearer syntax for this has been cited previously as an advantage of use(*) or use(...):
$a = 1;
function () use (&$b, ...) { // read as "use $b by reference, everything else by value"
return $a + $b;
}
- Auto-capture could still over-capture without people realizing it. Whether this is actually an issue in practice (or would be) is hard to say with certainty; I'm not sure if it's possible to make an educated guess based on a top-1000 analysis, so we're all trying to predict the future.
I tried to make very explicit what I was and was not disputing:
Whether the risk of these side effects is a big problem is up for debate, but it's wrong to suggest they don't exist.
The RFC seems to be implying that the implementation removes the side effects, but it does not, it is users paying attention to their code which will remove the side effects.
Arguably it's less of an issue only because short-closures are, well, short, so less likely to reuse variables unintentionally.
Our current short closures aren't just a single statement, they're a single expression, and that's a really significant difference, because it means to all intents and purposes they have no local scope. (You can create and use a local variable within one expression, but it requires the kind of twisted code that only happens in code golf.)
If there are no local variables, there is nothing to be accidentally captured. That's why the current implementation doesn't bother optimising which variables it captures - it's pretty safe to assume that all variables in the expression are either parameters or captured.
- The syntactic indicator that "auto capture will happen". The RFC says "fn". You're recommending "use(*)". However, changing the indicator syntax would do nothing to improve point 1.
The reason I think it would be better is because it is a more intentional syntax: the author of the code is more likely to think "I'm using an auto-capture closure, rather than an explicit-capture closure, what effect will that have?" and readers of the code are more likely to think "hm, this is using auto-capture, I wonder which variables are local, and which are captured?"
Of course they can still guess wrong, but I don't think "fn" vs "function" is a strong enough clue.
"Strong enough" is an opinion, and it seems all who have commented have differing ones of those.
But maybe a memory device would help address (some of?) your concerns:
-
fn() — It is SHORT and implicit. Short is CONVENIENT. Thus Short auto-captures variables because that is the most Convenient thing to do.
-
function() — It is LONG. Long is more EXPLICIT. Thus Long requires Explicitly declaring variables, which is also more rigorous and robust.
Or for the TL;DR crowd:
- fn() => SHORT => CONVENIENT => Auto-captures
- function() => LONG => EXPLICIT => Requires declaration
Hope this helps. #fwiw
-Mike
On lundi 13 juin 2022 15:36:26 CEST Rowan Tommins wrote:
Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.Live-variable analysis is mentioned in as part of implementation details.
It should not be necessary to understand these details to understand the
behavior of auto-capture.As noted in my other e-mail, by-value capture can still have side
effects, so users may still want to ensure that their code is free of
such side effects.
My choice of words in this reply was inaccurate when I said "In these
languages it is easy to accidentally override/bind a variable from
the parent scope by forgetting a variable declaration.", since "override" can
be interpreted in different ways.
What I meant here is that it is not possible to accidentally bind a variable
on the parent scope. This is actually impossible unless you explicitly capture
a variable by-reference. Do you agree with this ?
Possible side-effects via object mutations are documented in the "No
unintended side-effects" section of the RFC. This assumes that property
assignments or method calls to captured objects would be intended, since these
assignments/calls would result in an error if the variable was not defined and
not captured. Do you have examples where assignments/calls would non-
intendedly cause a side effect, with code you would actually write ?
As noted in my other e-mail, by-value capture can still have side
effects, so users may still want to ensure that their code is free of
such side effects.
There are two ways for a closure to have a side-effect (already documented in
the RFC) :
- The closure explicit captures a variable by reference, and bind it
- The closure mutates a value accessed through a captured variable. Mutable
values include objects and resources, but NOT scalars or arrays (since they
are copy-on-write).
In the first case, this is entirely explicit.
In the second case, the only thing you need do understand is that if you
access a variable you did not define, the variable is either undefined or
comes from the declaring scope. Accessing undefined variables is an error, so
it must come from the declaring scope.
Your example uses isset(), which is valid code in most circumstances, but as
you said it's not particularly good code. Do you have other examples that come
to mind ?
Currently, the only way to do so is to understand the "implementation
details"
I'm willing to make changes if that's true, because I definitely don't want
this to be the case.
- The closure mutates a value accessed through a captured variable. Mutable
values include objects and resources, but NOT scalars or arrays (since they
are copy-on-write).
It's not something that is used very often, so is often forgotten or
ignored, but there is technically an edge case where arrays are mutable:
they can contain references, and the references remain "live" even when
the array itself is passed by value.
So although unlikely, it is possible for a by-value closure to
over-write variables in other scopes: https://3v4l.org/dPZlI
// plain variable
$a = 42;
// array containing a reference
$b = [
'a' => &$a
];
// capture the array by-value
$f = function() use($b) {
// update the reference from inside the closure
$b['a'] = 69;
};
// call it
$f();
// observe that both the array and the plain variable now have the new value
var_dump($a, $b);
--
Rowan Tommins
[IMSoP]
(Sorry for double reply, hit send too soon)
Your example uses isset(), which is valid code in most circumstances, but as
you said it's not particularly good code. Do you have other examples that come
to mind ?
There is plenty of code out there in the real world that is not
particularly good, so I think it's a realistic example to think about.
The question is, do we think the language can and should help people
avoid that mistake?
- Would the explicitness of "use(...)" make it more likely someone would
spot it? - Would people be more likely to write "let $guest=..." (or whatever
block-scope keyword we choose) than add "$guest = null;" at the
beginning of the closure?
I'm not totally sure, but we should always consider the impact on
less-expert users, not just the power-users who are the ones often
asking for new features.
Regards,
--
Rowan Tommins
[IMSoP]
Hi Arnaud,
Arnaud Le Blanc arnaud.lb@gmail.com wrote:
Following your comment, I have clarified a few things in the "Auto-capture
semantics" section. This includes a list of way in which these effects can be
observed. These are really marginal cases that are not relevant for most
programs.
Cool, thanks.
Unfortunately, Arrow Functions already auto-capture today, so requiring a
use(*)
to enable auto-capture would be a breaking change.
I think there are two things that making this conversation be more
adversial than it could be otherwise:
-
Some people really want implicit auto-capture, while others are
deeply fearful of it. That comes more from the experience people have
from writing/reading different types of code leading them to have
different aesthetic preferences. Trying to persuade people their lived
experience is wrong, is hard. -
The current situation of having to list all variables is kind of
annoying when it doesn't provide much value e.g. for stuff like:
function getCallback($foo, $bar, $quux)
{
return function($x) use ($foo, $bar, $quux)
{
return $quux($foo, $bar, $x);
}
}
Where the code that returns the closure is trivial having to list out
the full of captured variables does feel tedious, and doesn't provide
any value.
I realise it's annoying when people suggest expanding the scope of an
RFC, however...how would you feel about adding support for use(*) to
the current 'long closures'?
That way, people could choose between:
- Explicit capture of individual variables: function($x) use ($foo,
$bar, $quux) {...} - Explicit capture of all relevant variables: function($x) use (*) {...}
- Implicit capture of all relevant variables, and fewer letters: fn($x) {...}
People who don't want implicit capture would be able tell their code
quality analysis tools to warn on any use of short closures (or
possibly better, warn when a variable has been captured). People who
do want implicit capture can use the short closures which always have
implicit capture.
cheers
Dan
Ack
Hi Larry, Arnaud,
Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.
Separating off some pedantism from the hopefully constructive comment,
I think some of the words in the RFC are a bit inaccurate:
A by-value capture means that it is not possible to modify any variables from the outer scope:
Because variables are bound by-value, the confusing behaviors often associated with closures do not exist.
Because variables are captured by-value, Short Closures can not have unintended side effects.
Those statements are true for scalar values. They are not true for objects:
class Foo
{
function __construct(public string $value) {}
function __toString()
{
return $this->value;
}
}
$a = new Foo('bar');
$f = fn() {
$a->value = 'explicit scope is nice';
};
print $a; // prints "bar"
$f();
print $a; // prints 'explicit scope is nice';
Yes, I know you can avoid these types of problems by avoiding
mutability, and/or avoiding capturing variables that represent
services, but sometimes those things are needed.
When you are capturing objects that can have side effects, making that
capture be explicit is quite nice (imo). I think the different
emphasis on capturing scalar values or objects might come down to a
difference in style of how different people use closures.
cheers
Dan
Ack
Hi Dan,
On lundi 13 juin 2022 19:49:10 CEST Dan Ackroyd wrote:
Auto-capture in PHP is by-value. This makes this impossible. It also makes
explicit declarations non-necessary and much less useful.Separating off some pedantism from the hopefully constructive comment,
I think some of the words in the RFC are a bit inaccurate:
A by-value capture means that it is not possible to modify any variables
from the outer scope:Because variables are bound by-value, the confusing behaviors often
associated with closures do not exist.Because variables are captured by-value, Short Closures can not have
unintended side effects.
Those statements are true for scalar values. They are not true for objects:
This is shown in the "No unintended side-effects" section of the RFC.
I agree that the choice of words is inaccurate, as "modify any variable" could
be interpreted not only as "bind a variable", but also as "mutate a value".
The section you have quoted is meant to show how by-value capture, which is
the default capture mode in all PHP closures, is less error prone than by-
variable/by-reference capture, by a huge margin. Especially since variable
bindings do not have side-effects unless a variable was explicitly captured
by-reference. Do you agree with this ?
The "No unintended side-effects" section assumes that property assignments to
captured variables are intended side-effects. In your example, the programmer
intended to have a side effect because $a
can only come from the declaring
scope (the code would result in an error otherwise) :
$a = new Foo('bar');
$f = fn() {
$a->value = 'explicit scope is nice';
};
Do you have an example where the intent would be less obvious ? With code you
would actually write ?
Cheers,
Arnaud Le Blanc
Because variables are captured by-value, Short Closures can not have
unintended side effects.Those statements are true for scalar values. They are not true for objects:
This is shown in the "No unintended side-effects" section of the RFC.
I'm confused by the last example:
$fn2 = function () use (&$a) { /* code with $a AND $b */ }
Isn't that missing a ", $b" in the use
?
And like others, I also find that allowing mixing explicit by-value
capture with auto-capture is not really needed and even confusing; if
you "expect that explicitly capturing by value will be rare in
practice" you might as well forbid it?
Maybe you don't even need to add explicit [by-reference] capture to
short closures at all, but rather extend long closures so that we
can write things like:
$val1 = `rand()`; $val2 = `rand()`; $ref = null;
$fn1 = function () use (...) { /* do something with $val1 and $val2 */ };
$fn2 = function () use (&$ref, ...) { $ref = $val1 + $val2; };
(and even if not, at least mention in the RFC that it has been considered)?
By the way, what about arrow functions? e.g.
$fn = fn () use (&$ref) => $ref = $val1 + $val2; // assigns and returns
Would that be allowed? Is it really desirable?
Regards,
--
Guilliam Xavier
Because variables are captured by-value, Short Closures can not have
unintended side effects.Those statements are true for scalar values. They are not true for objects:
This is shown in the "No unintended side-effects" section of the RFC.
I'm confused by the last example:
$fn2 = function () use (&$a) { /* code with $a AND $b */ }
Isn't that missing a ", $b" in the
use
?And like others, I also find that allowing mixing explicit by-value
capture with auto-capture is not really needed and even confusing; if
you "expect that explicitly capturing by value will be rare in
practice" you might as well forbid it?
Arnaud and I discussed it, and we're going to drop the mix-autocapture-and-manual functionality. I was tepid on it to begin with, and it can be confusing. RFC will be updated soon.
By the way, what about arrow functions? e.g.
$fn = fn () use (&$ref) => $ref = $val1 + $val2; // assigns and returns
Would that be allowed? Is it really desirable?
I don't think it's really desireable. By-ref closure is unusual, probably even less so in one line closures (though I've not checked that specifically), references are usually a bad idea anyway, and in those unusual cases the long-form is still there if you want to control everything.
--Larry Garfield
Den 2022-06-13 kl. 14:57, skrev Arnaud Le Blanc:
On samedi 11 juin 2022 23:14:28 CEST Rowan Tommins wrote:
My main concern is summed up accidentally by your choice of subject line
for this thread: is the proposal to add short closure syntax or is it
to add auto-capturing closures?The proposal is to extend the Arrow Functions syntax so that it allows
multiple statements. I wanted to give a name to the RFC, so that we could
refer to the feature by that name instead of the longer "auto-capture multi-
statement closures". But the auto-capture behavior is an important aspect we
want to inherit from Arrow Functions.As such, I think we need additional features to opt
back out of capturing, and explicitly mark function- or block-scoped
variables.Currently the
use()
syntax co-exists with auto-capture, but we could change
it so that an explicituse()
list disables auto-capture instead:fn () use ($a) { } // Would capture $a and disable auto-capture fn () use () { } // Would capture nothing and disable auto-capture
I like this idea very much. In the RFC two variables are captured
explicitly and one implicitly.
$c = 1;
fn () use ($a, &$b) { return $a + $b + $c; }
I don't see the UC / value for not specifying $c while specifying
$a. Think it's much clearer when capturing variables to implicitly
capture everything or list the ones that should be captured. One
only need to think about which variables are listed, not the ones
that might be implicitly captured.
Of course capturing by reference will always be required to list
and if combined with capturing variables by value, they also needs
to be listed.
The there is this other proposal to enhance traditional anonymous
functions by allowing the syntax use(*), meaning capture everything.
Even if it's outside the scope of this RFC it could be mentioned in
"What about Anonymous Functions?" or "Future scope".
r//Björn L
Last year, Nuno Maduro and I put together an RFC for combining the
multi-line capabilities of long-closures with the auto-capture
compactness of short-closures. That RFC didn't fully go to completion
due to concerns over the performance impact, which Nuno and I didn't
have bandwidth to resolve.Arnaud Le Blanc has now picked up the flag with an improved
implementation that includes benchmarks showing an effectively net-zero
performance impact, aka, good news as it avoids over-capturing.The RFC has therefore been overhauled accordingly and is now ready for
consideration.
A little data:
I used Nikita's project analyzer on the top 1000 projects to get a rough sense of how long-closures are used now. All usual caveats apply about such survey data. I was specifically looking at how many use
statements a closure typically had, and how many statements it typically had. Mainly, I am interested in how common "really long closures where the developer is likely to lose track of what is and isn't closed over" are.
Total closures: 20052
Total used variables: 11534
Avg capture per closure: 0.575
Avg statements per closure: 0.575
Used variable distribution (# of use variables => how many times that happens):
0 => 12833
1 => 4585
2 => 1667
3 => 591
4 => 198
5 => 98
6 => 43
7 => 16
8 => 9
9 => 6
10 => 2
11 => 4
Statement count distribution (# of statements => how many times that happens):
0 => 266
1 => 13134
2 => 2885
3 => 1598
4 => 818
5 => 429
6 => 284
7 => 176
8 => 125
9 => 88
10 => 48
11 => 58
12 => 25
13 => 27
14 => 14
15 => 16
16 => 13
17 => 7
18 => 3
19 => 7
20 => 4
21 => 5
22 => 3
23 => 2
24 => 3
26 => 2
27 => 1
29 => 1
30 => 1
35 => 1
36 => 1
42 => 1
44 => 1
48 => 1
59 => 1
69 => 1
103 => 1
122 => 1
Analysis:
- The bulk of closures close over nothing, so are irrelevant for us.
- The bulk of closures use only one statement. That means they could easily be short-lambdas today, and are likely just pre-7.4 code that no one has bothered to update.
- The overwhelming majority of the rest are 2-3 lines long. The dropoff after that is quite steep. (Approximately halving each time, with a few odd exceptions.)
- Similarly, most
use
clauses contain 1-2 variables, and the dropoff after that is also quite steep. - There's some nitwit out there writing 122 line closures, and closing over 11 variables explicitly. Fortunately it looks like an extremely small number of nitwits. :-)
The primary target of this RFC is people writing 2-4 line closures that import 1-2 variables, both easily small enough that there should be very little risk of developers getting confused by their own code. Based on the data above, I conclude that group is very much the typical case for closures already, and thus the risk of this syntax resulting in harder to follow code where developers get confused about what is imported and what isn't is very low.
--Larry Garfield
A little data:
I used Nikita's project analyzer on the top 1000 projects to get a rough sense of how long-closures are used now. All usual caveats apply about such survey data. I was specifically looking at how many
use
statements a closure typically had, and how many statements it typically had. Mainly, I am interested in how common "really long closures where the developer is likely to lose track of what is and isn't closed over" are.Total closures: 20052
Total used variables: 11534
Did many of those closures use "pass by reference" in the use clause,
because that's one real differentiator between traditional closures and
short lambdas. There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.
--
Mark Baker
|. \ -3
|J/ PHP |
|| | __ |
|| |m| |m|
I LOVE PHP
A little data:
I used Nikita's project analyzer on the top 1000 projects to get a rough sense of how long-closures are used now. All usual caveats apply about such survey data. I was specifically looking at how many
use
statements a closure typically had, and how many statements it typically had. Mainly, I am interested in how common "really long closures where the developer is likely to lose track of what is and isn't closed over" are.Total closures: 20052
Total used variables: 11534Did many of those closures use "pass by reference" in the use clause,
because that's one real differentiator between traditional closures and
short lambdas. There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.
New run to check for that:
Total used variables: 11534
ByRef used variables: 1833
So around 13% of used variables are by-ref, and thus would need to be explicitly used even with the new syntax.
There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.
I... don't see what relevance that has? The potential for confusion is at the definition point, not call point. If a closure is used inline then those are the same place, but if they're not, it's only the definition point that is relevant at the moment.
--Larry Garfield
On dimanche 12 juin 2022 19:54:06 CEST Mark Baker wrote:
Did many of those closures use "pass by reference" in the use clause,
because that's one real differentiator between traditional closures and
short lambdas. There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.
Please note that auto-capture binds variables at function declaration. This is
the case in Arrow Functions, and is inherited by this RFC.
On dimanche 12 juin 2022 19:54:06 CEST Mark Baker wrote:
Did many of those closures use "pass by reference" in the use clause,
because that's one real differentiator between traditional closures and
short lambdas. There's also the fact that use values are bound at the
point where the closure is defined, not where it's called (if they even
exist at all at that point), although that's probably more difficult to
determine.Please note that auto-capture binds variables at function declaration. This is
the case in Arrow Functions, and is inherited by this RFC.--
To unsubscribe, visit: https://www.php.net/unsub.php
From a maintainer and code review aspect, I prefer the longer syntax
because it is 100% clear on which variables are being closed over and
utilized in the anonymous function. fn($x) => $x + $y is pretty clear
that $y is being pulled in from an outer scope but if you start
getting into longer ones, it can get non-obvious pretty quickly...
$func = fn($x) {
$y[] = $x;
// do some stuff
return $y;
}
If $y is pulled from the outside scope, it may or may not be
intentional but hopefully, it is an array. If anyone uses the name $y
outside the lambda, this code may subtly break.
That being said, I'd love this RFC broken into two RFCs, one for
generic auto-capturing and one for multi-line fn functions (just to
reduce some typing when refactoring). There are times when
auto-capturing can be useful for all lambdas, especially when writing
some custom middleware.
On lundi 13 juin 2022 12:28:17 CEST Robert Landers wrote:
From a maintainer and code review aspect, I prefer the longer syntax
because it is 100% clear on which variables are being closed over and
utilized in the anonymous function. fn($x) => $x + $y is pretty clear
that $y is being pulled in from an outer scope but if you start
getting into longer ones, it can get non-obvious pretty quickly...$func = fn($x) {
$y[] = $x;
// do some stuff
return $y;
}If $y is pulled from the outside scope, it may or may not be
intentional but hopefully, it is an array. If anyone uses the name $y
outside the lambda, this code may subtly break.
This is true for any function that uses the array-append operator on an
undefined variable.
The primary target of this RFC is people writing 2-4 line closures that import 1-2 variables
The first half of that sentence I was expecting - although as I've
already said, I think the chosen syntax suggests strongly that the RFC
is really targeting all closures, not any subset of them.
The second half makes much less sense. If you are only importing 1 or 2
variables, is writing their names really that big a burden?
Several of the conversations I've had on this in the past have been very
explicitly about the burden of large numbers of captures; if that's
really as rare as you suggest, it makes me wonder why we're even bothering.
Regards,
--
Rowan Tommins
[IMSoP]
The primary target of this RFC is people writing 2-4 line closures that import 1-2 variables
The first half of that sentence I was expecting - although as I've
already said, I think the chosen syntax suggests strongly that the RFC
is really targeting all closures, not any subset of them.The second half makes much less sense. If you are only importing 1 or 2
variables, is writing their names really that big a burden?Several of the conversations I've had on this in the past have been very
explicitly about the burden of large numbers of captures; if that's
really as rare as you suggest, it makes me wonder why we're even bothering.Regards,
Disclaimer: My own view, I cannot speak for Nuno or Arnaud.
If you're capturing a very large number of variables, then I would view that as a code smell. "Very large" is subjective, of course, and there's some context to it.
The two main use cases I see myself using are
A) 2-3 liners that use 1-3 variables from scope, so it's dead obvious what they are. In this case, the extra use clause doesn't really add much beyond visible noise.
B) An entire method body is a closure that is being returned, or inlined into an inTransction() call or something like that. In this case, basically all method parameters would be captured, and it would be on the very previous line, so no matter how many there are (more than ~5 is probably a problem with the method, not with the closure), they're redundant and don't tell you anything that isn't already self-evident.
So the burden is in having to think about redundant syntax at all, plus having more redundant text that has to be read in the future. Even with use(*) or use(...) or whatever, that's better than the status quo but is still just more boilerplate that would have to be added/removed when switching from a one line short lambda (side note: This is the term I always use; I basically never use "arrow function". I don't know how typical that is) to a 2-line closure when refactoring.
--Larry Garfield
That RFC didn't fully go to completion due to concerns over the performance impact
I don't believe that is an accurate summary. There were subtle issues
in the previous RFC that should have been addressed. Nikita Popov
wrote in https://news-web.php.net/php.internals/114239
I'm generally in favor of supporting auto-capture for multi-line closures.
There are some caveats though, which this RFC should address:
Subtle side-effects, visible in debugging functionality, or through destructor
effects (the fact that a variable is captured may be observable). I think it
nothing else, the RFC should at least make clear that this behavior
is explicitly unspecified, and a future implementation may no longer capture
variables where any path from entry to read passes a write.
To be clear, I don't fully understand all those issues myself (and I
have just enough knowledge to know to be scared to look at that part
of the engine) but my understanding is that the concerns are not about
just performance, they are deep concerns about the behaviour.
It would produce a better discussion if the RFC document either said
how those issues are resolved, or detail how they are still
limitations on the implementation.
It also probably would have been better (imo) to create a new RFC
document. The previous RFC went to vote, even if the vote was
cancelled. Diskspace is cheap. Having different (though similar) RFCs
under the same URL makes is confusing when trying to understand what
happened to particular RFCs.
cheers
Dan
Ack
On dimanche 12 juin 2022 20:05:02 CEST Dan Ackroyd wrote:
That RFC didn't fully go to completion due to concerns over the
performance impact
I don't believe that is an accurate summary. There were subtle issues
in the previous RFC that should have been addressed. Nikita Popov
wrote in https://news-web.php.net/php.internals/114239
It would produce a better discussion if the RFC document either said
how those issues are resolved, or detail how they are still
limitations on the implementation.
To be clear, I don't fully understand all those issues myself (and I
have just enough knowledge to know to be scared to look at that part
of the engine) but my understanding is that the concerns are not about
just performance, they are deep concerns about the behaviour.
Thank you for pointing this out. Nikita was referring to side-effects of
capturing too much variables, and suggested to make the capture analysis
behavior explicitly unspecified in the RFC so that it could be changed
(optimized) later.
The new version of the RFC does the optimization.
Following your comment, I have clarified a few things in the "Auto-capture
semantics" section. This includes a list of way in which these effects can be
observed. These are really marginal cases that are not relevant for most
programs.
Cheers
Arnaud Le Blanc
Following your comment, I have clarified a few things in the "Auto-capture
semantics" section. This includes a list of way in which these effects can be
observed. These are really marginal cases that are not relevant for most
programs.
I'm not sure I agree that all of these are marginal, or with the way
you've characterised them...
Note that destructor timing is undefined in PHP, especially when
reference cycles exist.
Outside of reference cycles, which are pretty rare and generally easy to
avoid, PHP's destructors are entirely deterministic. Unlike in fully
garbage-collected languages, you can use a plain object to implement an
"RAII" pattern - e.g. the constructor locks a file and the destructor
unlocks it; or the constructor starts a transaction, and the destructor
rolls it back if not yet committed.
A related case is resource lifetime: file and network handles are
guaranteed to be closed when they go out of scope, and accidentally
taking an extra copy of their "value" can prevent that.
It ends up capturing the same variables that would have been captured
by a manually curated |use| list.
This slightly muddles two different questions:
- Given a well-written closure, where all variables are either clearly
local or clearly intended to be captured, does the implementation do a
good job of distinguishing them? - Given a badly-written closure, where variables are accidentally
ambiguous, what side-effects might the user experience?
The answer to question 1 seems to be yes, the implementation does a good
job, and that's good news, and thank you for working on it.
That is not the same, however, as saying that question 2 is never
relevant. Consider the following, adapted from an example in the RFC:
$filter = fn ($user) {
if ( $user->id !== -1 ) {
$guest = $repository->findByUserId($user->id);
}
return isset($guest) && in_array($guest->id, $guestsIds);
};
This is not particularly great code, but it works ... unless the parent
scope happens to have a variable named $guest, which will then be bound
to the closure, since there is a path where it is read before being
written. In this case, side effects include:
- The behaviour will change based on the captured value of $guest
- Any resources held by that value will be held until $filter is
destructed, rather than when $guest is destructed
Whether the risk of these side effects is a big problem is up for
debate, but it's wrong to suggest they don't exist.
Regards,
--
Rowan Tommins
[IMSoP]