[RFC] Explicit call-site pass-by-reference (again)

5 years ago by Levi Morrison via internals — view source

unread

Just chiming in to voice strong support for this RFC. This is a key
piece toward making PHP code statically analyzable. If it becomes
required at the call site, such as in an edition of the language, it
will significantly enhance the ability to reason about code and
probably make it more correct as well. As a small example, consider
this method on an Optional type class:

function map(callable $f): Optional {
if ($this->enabled) {
return new Optional($f($this->data));
} else {
return $this;
}
}

The intent is to return a new optional or an empty one, but if you
pass a closure that accepts something by reference you can change the
original, which is not intended at all. For people who defend against
it, it requires saving $this->data to a local variable, then passing
in the local. Then if the user does a call-by-reference it will affect
the local, not the object's data.

5 years ago by Larry Garfield — view source

unread

Just chiming in to voice strong support for this RFC. This is a key
piece toward making PHP code statically analyzable. If it becomes
required at the call site, such as in an edition of the language, it
will significantly enhance the ability to reason about code and
probably make it more correct as well. As a small example, consider
this method on an Optional type class:

function map(callable $f): Optional {
if ($this->enabled) {
return new Optional($f($this->data));
} else {
return $this;
}
}

The intent is to return a new optional or an empty one, but if you
pass a closure that accepts something by reference you can change the
original, which is not intended at all. For people who defend against
it, it requires saving $this->data to a local variable, then passing
in the local. Then if the user does a call-by-reference it will affect
the local, not the object's data.

If $this->data is itself an object, then you have a concern for data manipulation (spooky action at a distance) even if it's passed by value. Given how much data these days is objects, and thus the problem exists regardless of whether it's by value or by reference passing, adding steps to make pass-by-reference harder doesn't seem to help much.

--Larry Garfield

5 years ago by Christian Schneider — view source

unread

Am 21.02.2020 um 00:04 schrieb Larry Garfield larry@garfieldtech.com:

Just chiming in to voice strong support for this RFC. This is a key
piece toward making PHP code statically analyzable. If it becomes
required at the call site, such as in an edition of the language, it
will significantly enhance the ability to reason about code and
probably make it more correct as well. As a small example, consider
this method on an Optional type class:

function map(callable $f): Optional {
if ($this->enabled) {
return new Optional($f($this->data));
} else {
return $this;
}
}

The intent is to return a new optional or an empty one, but if you
pass a closure that accepts something by reference you can change the
original, which is not intended at all. For people who defend against
it, it requires saving $this->data to a local variable, then passing
in the local. Then if the user does a call-by-reference it will affect
the local, not the object's data.

If $this->data is itself an object, then you have a concern for data manipulation (spooky action at a distance) even if it's passed by value. Given how much data these days is objects, and thus the problem exists regardless of whether it's by value or by reference passing, adding steps to make pass-by-reference harder doesn't seem to help much.

+1

The whole discussion about being worried about 'malicious' libraries altering your precious scalar values misses the fact that PHP is not a pure language, there are many ways a function can have side-effects, Larry pointing out one obvious one.
Speaking of language editions: Trying to solve one obscure case (and one which is easily enough detectable by statical analysis) by introducing such a big BC break could render a whole edition ineligible for a software project. So beware, features bundled in one (hypothetical) edition better not break too many different things at the same time.

If you don't trust your library code then you're in deep trouble anyway.

Chris

5 years ago by Nikita Popov — view source

unread

On Fri, Feb 21, 2020 at 12:05 AM Larry Garfield larry@garfieldtech.com
wrote:

Just chiming in to voice strong support for this RFC. This is a key
piece toward making PHP code statically analyzable. If it becomes
required at the call site, such as in an edition of the language, it
will significantly enhance the ability to reason about code and
probably make it more correct as well. As a small example, consider
this method on an Optional type class:

function map(callable $f): Optional {
if ($this->enabled) {
return new Optional($f($this->data));
} else {
return $this;
}
}

The intent is to return a new optional or an empty one, but if you
pass a closure that accepts something by reference you can change the
original, which is not intended at all. For people who defend against
it, it requires saving $this->data to a local variable, then passing
in the local. Then if the user does a call-by-reference it will affect
the local, not the object's data.

If $this->data is itself an object, then you have a concern for data
manipulation (spooky action at a distance) even if it's passed by value.
Given how much data these days is objects, and thus the problem exists
regardless of whether it's by value or by reference passing, adding steps
to make pass-by-reference harder doesn't seem to help much.

If you will allow me some exaggeration, what you're basically saying here
is that all the const / readonly / immutability features in (nearly) all
programming languages are useless, because they (nearly) always allow for
interior mutability in one way or another. "const" in JavaScript doesn't
allow you to rebind the object, but you can still modify the object. Same
with "final" in Java. Similar things hold in C/C++/Rust when it comes to
const pointers/references to structs that contain non-const
pointers/references. And of course, the "readonly" RFC for PHP that is
currently under discussion has the same characteristics.

What I'm trying to say here: All of these features do not guarantee
recursive immutability, but that doesn't render them useless in the least.
In fact, the outer-most layer is where immutability is the most important,
because there's a lot of difference between

$i = 0;
var_dump($i); // int(0)
foo($i);
var_dump($i); // array(7) { ... }
// WTF just happened???

and

$o = new Foo();
var_dump($o); // object(Foo) #42 { xxx }
foo($o);
var_dump($o); // object(Foo) #42 { yyy }
// Did something change in there? Doesn't really matter for this code!

One of the big differences is that by-reference passing can change the
type of the variable, while by-object passing cannot. It cannot even
change object identity.

On a closing note: I don't think this RFC makes passing by reference
"harder" in any meaningful sense. Yes, you do need to write one extra
character. In exchange, every time you read code you will immediately see
that by-reference passing is used, here be dragons.

Regards,
Nikita

5 years ago by Matthew Brown — view source

unread

This proposal is great, but most PHP static analysis tools already do a
reasonable job of understanding by-reference assignment and detecting bugs
there (an exception is closure use by-reference checks, which is a
static-analysis no-man's land).

No static analysis tools catch your specific use-case, though.

On Thu, 20 Feb 2020 at 09:48, Levi Morrison via internals <
internals@lists.php.net> wrote:

Just chiming in to voice strong support for this RFC. This is a key
piece toward making PHP code statically analyzable. If it becomes
required at the call site, such as in an edition of the language, it
will significantly enhance the ability to reason about code and
probably make it more correct as well. As a small example, consider
this method on an Optional type class:

function map(callable $f): Optional {
if ($this->enabled) {
return new Optional($f($this->data));
} else {
return $this;
}
}

The intent is to return a new optional or an empty one, but if you
pass a closure that accepts something by reference you can change the
original, which is not intended at all. For people who defend against
it, it requires saving $this->data to a local variable, then passing
in the local. Then if the user does a call-by-reference it will affect
the local, not the object's data.

5 years ago by Mark Randall — view source

unread

The RFC proposes to allow using a "&" marker at the call-site (in addition
to the declaration-site) when by-reference passing is used.

It's a solid +1 from me

I do think this is somewhere else that an "official" upgrade /
migration tool would be rather well-received, an easy mechanism to scan
a file / directory for standard extension functions with known reference
args and re-write them appropriately.

--
Mark Randall
marandall@php.net

5 years ago by Mike Schinkel — view source

unread

I'd like to start the discussion on the "explicit call-site
pass-by-reference" RFC again:
https://wiki.php.net/rfc/explicit_send_by_ref

If $this->data is itself an object, then you have a concern for data manipulation (spooky action at a distance) even if it's passed by value. Given how much data these days is objects, and thus the problem exists regardless of whether it's by value or by reference passing, adding steps to make pass-by-reference harder doesn't seem to help much.

--Larry Garfield

+1

The whole discussion about being worried about 'malicious' libraries altering your precious scalar values misses the fact that PHP is not a pure language, there are many ways a function can have side-effects, Larry pointing out one obvious one.
Speaking of language editions: Trying to solve one obscure case (and one which is easily enough detectable by statical analysis) by introducing such a big BC break could render a whole edition ineligible for a software project. So beware, features bundled in one (hypothetical) edition better not break too many different things at the same time.

If you don't trust your library code then you're in deep trouble anyway.

A huge +1 to Nikita's RFC.

A noted -1 to both Larry and Christian's objection. Why? Because perfect should not be the enemy of the significant improvement for specific use-cases unless it can be illustrated that making the improvement disallows future perfection.

-Mike

5 years ago by Rowan Tommins — view source

unread

Hi internals,

I'd like to start the discussion on the "explicit call-site
pass-by-reference" RFC again:
https://wiki.php.net/rfc/explicit_send_by_ref

Hi Nikita,

Thanks for putting the case for this so clearly. My instinctive reaction is still one of frustration that the pain of removing call-site ampersands was in vain, and I will now be asked to put most of them back in. It's also relevant that users already find where & should and should not be used very confusing. There is a potential "PR" cost of this change that should be weighed against the advantages.

I'm also not very keen on internal functions being able to do things that can't be replicated on userland, and this RFC adds two: additional behaviour for existing "prefer-ref" arguments, and new "prefer-value" arguments.

My current opinion is that I'd rather wait for the details of out and inout parameters to be worked out, and reap higher gains for the same cost. For instance, if preg_match could mark $matches as "out", I'd be more happy to run in a mode where I needed to add a call-site keyword.

Regards,

--
Rowan Tommins
[IMSoP]

5 years ago by Mike Schinkel — view source

unread

Hi internals,

I'd like to start the discussion on the "explicit call-site
pass-by-reference" RFC again:
https://wiki.php.net/rfc/explicit_send_by_ref

My instinctive reaction is still one of frustration that the pain of removing call-site ampersands was in vain, and I will now be asked to put most of them back in.

That is a great example of what is known as a "sunken cost."

In summary "A a sunken cost is a cost paid in the past that is no longer relevant to decisions about the future."

It's also relevant that users already find where & should and should not be used very confusing.

One of the reasons it is confusing is because developers are currently required to use the ampersand in one place and not the other. Making it always used removes said confusion as they would no longer be a reason to have to remember when and when not to use the ampersand anymore.

There is a potential "PR" cost of this change that should be weighed against the advantages.

To say "We fixed something that in hindsight we've since determined was a problem." How is this a concern?

And when has the PHP community primarily worried about PR cost anyway, except with Hack starting eating PHP's lunch in terms of performance?

I'm also not very keen on internal functions being able to do things that can't be replicated on userland, and this RFC adds two: additional behaviour for existing "prefer-ref" arguments, and new "prefer-value" arguments.

I used to have the same preference. And then I realized that languages that allow everything and do not withhold low-level functionality allows userland to create of DSL-like extensions that can result in highly fragile and obtuse architectures. Just look at Ruby.

And yes that is an abstraction, but so is a generic concern about adding internal functions that cannot be leveraged in userland.

So what specific problems would having these enhancement cause for the language?

My current opinion is that I'd rather wait for the details of out and inout parameters to be worked out, and reap higher gains for the same cost. For instance, if preg_match could mark $matches as "out", I'd be more happy to run in a mode where I needed to add a call-site keyword.

This sounds like preferring perfect in the (potentially distant) future vs. much better today.

If this feature does not block some abstract vision for a perfect future and is something that can be delivered in the short term to solve real-world problems today, why stand in its way?

-Mike

5 years ago by Rowan Tommins — view source

unread

On Feb 21, 2020, at 5:20 PM, Rowan Tommins rowan.collins@gmail.com
wrote:
My instinctive reaction is still one of frustration that the pain of
removing call-site ampersands was in vain, and I will now be asked to
put most of them back in.

That is a great example of what is known as a "sunken cost."

Perhaps, yes. I freely admit it's an emotional reaction rather than a rational one.

One of the reasons it is confusing is because developers are currently
required to use the ampersand in one place and not the other. Making
it always used removes said confusion as they would no longer be a
reason to have to remember when and when not to use the ampersand
anymore.

Maybe. I think a larger part of it is that references themselves are a slightly confusing concept, and the fact that & looks like an operator of its own (and is often documented that way) but is really an annotation on other operators/commands. That is, the & in $foo = &$bar and return &$bar doesn't modify $bar, it modifies = and return, respectively.

Making the rules more logical and symmetrical would perhaps be more helpful to new users than it is to established users, particularly those who've known multiple versions of the language already.

There is a potential "PR" cost of this change that should be weighed
against the advantages.

To say "We fixed something that in hindsight we've since determined was
a problem." How is this a concern?

The concern is that the costs will be much more visible to users than the benefits, and they will resent the core developers pushing that requirement onto them, rather than thanking then for their hard work.

As I said, that's not an absolute reason not to do it, it's a cost to be weighed.

I'm also not very keen on internal functions being able to do things
that can't be replicated on userland, and this RFC adds two: additional
behaviour for existing "prefer-ref" arguments, and new "prefer-value"
arguments

So what specific problems would having these enhancement cause for the
language?

There are two problems I have with internal-only features in general: the inability to polyfill and extend, and the requirement for a separate mental model.

As an example of the first, the RFC mentions using call_user_func with a call-site annotation to forward the parameter by reference. The reason for allowing that also applies to a user-defined wrapper like call_with_current_user or call_with_swapped_parameters, but there's no syntax for those to be marked "prefer-val".

As an example of the second, even under strict settings, calls to certain internal functions will have an optional & at the call site, which changes their behaviour. To those without knowledge of the core, those functions simply have to be remembered as "magic", because their behaviour can't be modelled as part of the normal language.

My current opinion is that I'd rather wait for the details of out and
inout parameters to be worked out, and reap higher gains for the same
cost. For instance, if preg_match could mark $matches as "out", I'd be
more happy to run in a mode where I needed to add a call-site keyword.

This sounds like preferring perfect in the (potentially distant) future
vs. much better today.

No, it's preferring to hold out for a little bit more value to weigh against my evaluation of the cost.

This is, when followed through to its conclusion of mandatory marking, a disruptive change to every piece of code, so we need to decide if the disruption is worth it.

It's also the second change in the same place, and we should be sure that we've got it right this time, and won't require a third change in the near future. For instance, if out parameters were added, would the same line of code end up going from optional &, to forbidden &, to mandatory &, to mandatory "out"?

I'm not strongly against the idea, but the advantages just don't feel quite strong enough, so if I had a vote, I'd currently be inclined to vote no.

Regards,

--
Rowan Tommins
[IMSoP]

5 years ago by Mike Schinkel — view source

unread

One of the reasons it is confusing is because developers are currently
required to use the ampersand in one place and not the other. Making
it always used removes said confusion as they would no longer be a
reason to have to remember when and when not to use the ampersand
anymore.

Maybe. I think a larger part of it is that references themselves are a slightly confusing concept, and the fact that & looks like an operator of its own (and is often documented that way) but is really an annotation on other operators/commands. That is, the & in $foo = &$bar and return &$bar doesn't modify $bar, it modifies = and return, respectively.

Making the rules more logical and symmetrical would perhaps be more helpful to new users than it is to established users, particularly those who've known multiple versions of the language already.

You call out the use of the ampersand being viewed as an operator acting on a variable as problematic, but that is already baked into current PHP, not going to change any time soon if ever, and is orthogonal to this RFC.

So whether or not people find the ampersand operator to be confusing that is irrelevant to the debate posed by Nikita's RFC over whether we should make the use of ampersand related to passing-by-reference be more consistent.

There is a potential "PR" cost of this change that should be weighed
against the advantages.

To say "We fixed something that in hindsight we've since determined was
a problem." How is this a concern?

The concern is that the costs will be much more visible to users than the benefits, and they will resent the core developers pushing that requirement onto them, rather than thanking then for their hard work.

As I said, that's not an absolute reason not to do it, it's a cost to be weighed.

Nikita's RFC proposes that the ampersand would be optional at the calling site, so is it really a concern that developers will resent something that is optional?

Yes Nikita mentioned that a future "edition" might make is a requirement, but even then it will still be optional — developers can choose not to use the new edition — and I think the resentment will come more from the concept of forcing an "edition" on developers than any specific feature. Note that I plan to post soon about how I think we can alleviate that.

So we can debate the PR "cost" of requiring ampersands at the call site when the requiring RFC is on the table.

As a side note, I remember thinking "WTF?!?" when the requirement to use an ampersand at the calling site was removed. It is possible your analysis of PR cost is discounting the potential large number of people who will think adding it back is a good think.

I'm also not very keen on internal functions being able to do things
that can't be replicated on userland, and this RFC adds two: additional
behaviour for existing "prefer-ref" arguments, and new "prefer-value"
arguments

So what specific problems would having these enhancement cause for the
language?

There are two problems I have with internal-only features in general: the inability to polyfill and extend, and the requirement for a separate mental model.

As an example of the first, the RFC mentions using call_user_func with a call-site annotation to forward the parameter by reference. The reason for allowing that also applies to a user-defined wrapper like call_with_current_user or call_with_swapped_parameters, but there's no syntax for those to be marked "prefer-val".

Let's analyze.

In this case there does not appear to be a need for "prefer-val." And Nikita's RFC adds functionality we currently do not have — ability to pass by reference to call_user_func() so that is a win over status quo as it gains a feature that we previously internal-only:

<?php
function current_user():int {
return 1;
}
function foobar( int $current_user, int &$foo, int $bar ) {
$foo++;
$bar++;
}
function call_with_current_user(Callable $callable, int &$foo, int $bar ) {
return call_user_func( $callable, current_user(), &$foo, $bar );
}
$foo = 0;
$bar = 0;
call_with_current_user( 'foobar', &$foo, $bar );
echo $foo; // prints 1
echo $bar; // prints 0

Now if you want to use call_user_func_array() then I agree we are missing functionality. Consider this example which envisions a new function called func_is_byref_arg(). Yes the function name is a horrible name but consistent with existing functions for related purposes:

<?php
function current_user():int {
return 1;
}
function foobar( int $current_user, int ...&$args ) {
foreach( $args as $i => $f ) {
$args[$i]++;
}
}
function call_with_current_user(Callable $callable, int ...&$args ) {
array_unshift(&$args,current_user());
$temp_args = $args;
$result = call_user_func_array( $callable, &$temp_args );
foreach( $temp_args as $i => $t ) {
if ( func_is_byref_arg( $i, $args ) ) {
$args = $temp_args[$i];
}

}
return $result;
}
$foo = 0;
$bar = 0;
$baz = 0;
call_with_current_user( 'foobar', &$foo, $bar, $baz );
echo $foo; // prints 1
echo $bar; // prints 0
echo $baz; // prints 0

In my example func_is_byref_arg($pos[,$variadic_arg]):bool accepts one parameter if you are checking for by-ref positionally, and two if you are introspecting a variadic parameter.

So I argue we should fill in the holes of the RFC that introduces a feature that to help developers write more robust code instead of decline an RFC for imperfections in its first draft.

As an example of the second, even under strict settings, calls to certain internal functions will have an optional & at the call site, which changes their behaviour.

To those without knowledge of the core, those functions simply have to be remembered as "magic", because their behaviour can't be modelled as part of the normal language.

I am unclear how the optional ampersand at the call site will change the behavior.

As I understand the RFC the behavior will still be driven by the ampersand at the declaration site. The presence or absence of ampersand at a call still will merely be decoration that allows developers to better convey their intent.

Can you please give an example of how this RFC would change behavior at call site compared to a call site where the ampersand did not exist, given the behavior of this RFC?

My current opinion is that I'd rather wait for the details of out and
inout parameters to be worked out, and reap higher gains for the same
cost. For instance, if preg_match could mark $matches as "out", I'd be
more happy to run in a mode where I needed to add a call-site keyword.

This sounds like preferring perfect in the (potentially distant) future
vs. much better today.

No, it's preferring to hold out for a little bit more value to weigh against my evaluation of the cost.

This is, when followed through to its conclusion of mandatory marking, a disruptive change to every piece of code, so we need to decide if the disruption is worth it.

That is disingenuous. The RFC does not require mandatory use, period.

The "cost" you worry about will not exist unless and until a future RFC proposes to make it mandatory and that RFC is accepted.

Further, your cost analysis does not appear to consider the cost of status quo and this RFC's ability to reduce that cost.

Using Nikita's RFC example there is a potential real-world cost to getting the following wrong in a userland project:

$ret = array_slice($array, 0, 3);
$ret = array_splice($array, 0, 3);

With Nikita's RFC developers could chose to start using ampersand at the calling site for these type of methods. Let's consider that I write the following:

array_slice(&$array, 0, 3);
array_splice(&$array, 0, 3);

With this RFC (I assume) an error could be generated on array_splice(&$array, 0, 3)saying that I cannot pass the array by reference. Today we don't get that. This alone could reduce errors that I have seen in source code and I admittedly have committed myself.

Said succinctly, there is a (IMO significant) cost to doing nothing that your analysis appears to ignore.

It's also the second change in the same place, and we should be sure that we've got it right this time, and won't require a third change in the near future.

I don't particularly see a problem with requiring a third change in the future. Hindsight is a wonderful clarifier. And I believe elsewhere you have been debating me over the need for incremental change. Caveat emptor.

For instance, if out parameters were added, would the same line of code end up going from optional &, to forbidden &, to mandatory &, to mandatory "out"?

My view is that we should actually hash those concerns out and move forward rather than state them in the abstract and let the fact that legitimate concerns might exist derail an improvement to the language.

Since there are not infinite potentials, let's just address your specific concerns here. My straw man proposal is that if we add an out keyword exists then a developer could use either out or & but not both. Then in a future "edition" of pHP it would be possible that we disallow & if enough people agree that that is better. Or we could leave as either/or. Allowing ampersand at a call site today does not block potential future out keywords AFAICT.

For me I don't care which it is as long as there is a calling site notation that allows a developer to write code indicating intent and for other developers to read code and see that intent.

Status quo waiting for some future potential that may not arrive for years does not get us there in the near term, but Nikita's RFC would.

I'm not strongly against the idea, but the advantages just don't feel quite strong enough, so if I had a vote, I'd currently be inclined to vote no.

Heh. My vote would yes. And since neither of us have a vote I guess it would be applicable to say they cancel each other's vote out. Or not. :-D

-Mike

5 years ago by Rowan Tommins — view source

unread

Hi Mike,

First, I'd just like to reiterate that I absolutely see benefits in this
proposal, and am definitely not campaigning for it to be abandoned as a
bad idea. Like with any proposal, we have to weigh those benefits
against the costs, and my current personal opinion is that the scales
come down very slightly on the cost side.

I will also just say that you have made some valid points about
different ways people might perceive this change, and my fears on that
score may be overblown.

The RFC does not require mandatory use, period.

The "cost" you worry about will not exist unless and until a future RFC proposes to make it mandatory and that RFC is accepted.

The RFC states very clearly that the full benefit of the change will
only be realised by making the markers mandatory in some way, and
includes specific discussion of how that might be introduced. Put
simply, tools (and even humans) get most from knowing that a particular
line of code won't pass anything by reference, and optional markers
can't guarantee that.

I am analysing the proposal on that basis, just as I would analyse a
proposed deprecation on the basis that the deprecated feature will one
day be removed.

If we analyse it on the basis of it never becoming mandatory, we have
to adjust our analysis of both costs and benefits.

Regarding prefer-ref and prefer-val:

function call_with_current_user(Callable $callable, int &$foo, int $bar ) {
return call_user_func( $callable, current_user(), &$foo, $bar );
}

If you define the function this way, all callers are required to pass
the parameter by reference. That immediately means that this is a fatal
error:

call_with_current_user('foobar', 42, 42);

Internal functions have the magical ability to accept both literals
values and reference variables, whereas userland functions have to
choose one or the other.

As an example of the second, even under strict settings, calls to certain internal functions will have an optional & at the call site, which changes their behaviour.

To those without knowledge of the core, those functions simply have to be remembered as "magic", because their behaviour can't be modelled as part of the normal language.
I am unclear how the optional ampersand at the call site will change the behavior.

I was referring to this line in the RFC:

If the argument is a prefer-ref argument of an internal function, then
adding the |&| annotation will pass it by reference, while not adding
it will pass it by value. Outside this mode, the passing behavior
would instead be determined by the VM kind of the argument operand.

That means that for any function implemented internally as "prefer-ref",
the user can now choose whether their variable will be overwritten by
the function. I don't know exactly which functions this would affect,
because as far as I know, the manual doesn't have a standard way to
annotate "prefer-ref". Which is kind of my point: it's magic behaviour
which sits outside most people's understanding of the language.

I don't particularly see a problem with requiring a third change in
the future. Hindsight is a wonderful clarifier. And I believe
elsewhere you have been debating me over the need for incremental
change. Caveat emptor.

The distinction I would make is between incremental change, and
contradictory change. If we later introduce out parameters in a way
that's compatible with call-site &, that would indeed be incremental
change; the effort spent adding & would move code closer to the final
state. If we end up introducing call-site "out", the effort spent adding
& will simply be compounded with the effort spent adding "out".

Predicting the future is a mug's game, but it's at least worth exploring
some possible futures, and how decisions now might help or hinder them.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by Mike Schinkel — view source

unread

Hi Mike,

First, I'd just like to reiterate that I absolutely see benefits in this proposal, and am definitely not campaigning for it to be abandoned as a bad idea. Like with any proposal, we have to weigh those benefits against the costs, and my current personal opinion is that the scales come down very slightly on the cost side.

I will also just say that you have made some valid points about different ways people might perceive this change, and my fears on that score may be overblown.

The RFC does not require mandatory use, period.

The "cost" you worry about will not exist unless and until a future RFC proposes to make it mandatory and that RFC is accepted.

The RFC states very clearly that the full benefit of the change will only be realised by making the markers mandatory in some way, and includes specific discussion of how that might be introduced. Put simply, tools (and even humans) get most from knowing that a particular line of code won't pass anything by reference, and optional markers can't guarantee that.

Fair point.

I am analysing the proposal on that basis, just as I would analyse a proposed deprecation on the basis that the deprecated feature will one day be removed.

If we analyse it on the basis of it never becoming mandatory, we have to adjust our analysis of both costs and benefits.

However, if you consider editions, it may not ever need to become mandatory and yet those who want it could still benefit.

Regarding prefer-ref and prefer-val:

function call_with_current_user(Callable $callable, int &$foo, int $bar ) {
return call_user_func( $callable, current_user(), &$foo, $bar );
}

If you define the function this way, all callers are required to pass the parameter by reference. That immediately means that this is a fatal error:

call_with_current_user('foobar', 42, 42);

Internal functions have the magical ability to accept both literals values and reference variables, whereas userland functions have to choose one or the other.

Uh, yeah I guess. But I would ask the question, why would you want to do that?

The reason we don't make the ampersand a requirement is a legacy concern. But if you are writing a new function there is no legacy concern. So it would seem a developer would want to force the ampersand.

Or is your point just that there are a list of possible options and you want the ability to use all options from that list regardless of whether a specific option has a valid use-case?

As an example of the second, even under strict settings, calls to certain internal functions will have an optional & at the call site, which changes their behaviour.

To those without knowledge of the core, those functions simply have to be remembered as "magic", because their behaviour can't be modelled as part of the normal language.
I am unclear how the optional ampersand at the call site will change the behavior.

I was referring to this line in the RFC:

If the argument is a prefer-ref argument of an internal function, then adding the |&| annotation will pass it by reference, while not adding it will pass it by value. Outside this mode, the passing behavior would instead be determined by the VM kind of the argument operand.

That means that for any function implemented internally as "prefer-ref", the user can now choose whether their variable will be overwritten by the function. I don't know exactly which functions this would affect, because as far as I know, the manual doesn't have a standard way to annotate "prefer-ref". Which is kind of my point: it's magic behaviour which sits outside most people's understanding of the language.

Sounds like the solution then is to update the documentation?

I don't particularly see a problem with requiring a third change in the future. Hindsight is a wonderful clarifier. And I believe elsewhere you have been debating me over the need for incremental change. Caveat emptor.

The distinction I would make is between incremental change, and contradictory change. If we later introduce out parameters in a way that's compatible with call-site &, that would indeed be incremental change; the effort spent adding & would move code closer to the final state. If we end up introducing call-site "out", the effort spent adding & will simply be compounded with the effort spent adding "out".
Predicting the future is a mug's game, but it's at least worth exploring some possible futures, and how decisions now might help or hinder them.

I do agree that changes that are contradictory as problematic.

However, in this case Nikita has weighed in and said "out" is unlikely to happen. So that seems to remove the concern about conflicts with out parameters?

-Mike

5 years ago by Nikita Popov — view source

unread

On Fri, Feb 21, 2020 at 11:20 PM Rowan Tommins rowan.collins@gmail.com
wrote:

On 20 February 2020 14:13:58 GMT+00:00, Nikita Popov nikita.ppv@gmail.com
wrote:

Hi internals,

I'd like to start the discussion on the "explicit call-site
pass-by-reference" RFC again:
https://wiki.php.net/rfc/explicit_send_by_ref

Hi Nikita,

Thanks for putting the case for this so clearly. My instinctive reaction
is still one of frustration that the pain of removing call-site ampersands
was in vain, and I will now be asked to put most of them back in. It's also
relevant that users already find where & should and should not be used very
confusing. There is a potential "PR" cost of this change that should be
weighed against the advantages.

I'm also not very keen on internal functions being able to do things that
can't be replicated on userland, and this RFC adds two: additional
behaviour for existing "prefer-ref" arguments, and new "prefer-value"
arguments.

I should say that this is a non-essential part of the RFC. I noticed that
this RFC provides a way to solve this problem, but if we don't think it the
problem is worth solving, then we don't have to solve it.

The prefer-ref/prefer-val thing is indeed a bit peculiar. It's an artifact
of the current way of implicit by-reference passing, where the decision of
whether to pass by-value or by-reference has to be made based on an
"educated guess" at the call-site. That leaves us with always-val,
always-ref, prefer-val and prefer-ref as the possible passing modes. In the
explicit by-ref passing regime, the latter two consolidate, and we have
by-val, by-ref and "either" as the options, which is a lot more obvious.

But again, I can't say I'm fully convinced myself that this is really a
problem we need to solve. I don't really care about call_user_func() at all
(it is entirely obsoleted by $fn()), and now that I think about it,
__call() isn't really the right primitive to expose anyway. If you will
allow me a little digression...

Instead of having __call(), what we really should have is __get_method().
For a simple forwarding proxy, the implementation would look something like
this:

public function __get_method(string $name): Closure {
if (method_exists($this->proxy, $name)) {
return Closure::fromCallable([$this->proxy, $name]);
}
return null;
}

This solves multiple problems with one stone: First, it preserves the
signature of the method we're proxying to: This is better than the solution
in this RFC, because it preserves both by-ref argument passing and by-ref
returns, and can validate that properly (i.e. passing a non-ref to by-ref
will diagnose). Second, it makes is_callable() work precisely, because we
no longer have to assume that with __call() any method is callable. Third,
it makes Reflection work on the proxied method.

It's possible to recover normal __call() semantics from this approach by
writing something like this:

public function __get_method(string $name): Closure {
return function(...$args) use($name) {
// Normal __call() implementation in here.
};
}

My current opinion is that I'd rather wait for the details of out and inout

parameters to be worked out, and reap higher gains for the same cost. For
instance, if preg_match could mark $matches as "out", I'd be more happy to
run in a mode where I needed to add a call-site keyword.

I believe we talked about this in some detail in the previous discussion on
this topic. My basic stance on in/out is that it's probably not worth the
complexity, unless it is part of an effort to eliminate references from PHP
entirely (which would be hugely beneficial). Unfortunately I don't really
see a clear pathway towards that. "out" parameters can remove one use-case
of references, and I can see how that would work both in terms of semantics
and implementation. The case of "inout" parameters is much more
problematic. While these can nominally work without references, I don't see
how they could do so efficiently (we would have to "move out" the value
from the original location to avoid COW). Similarly, I don't have any
answer to how &__get() and &offsetGet() would work without references.

Regards,
Nikita

5 years ago by Rowan Tommins — view source

unread

The prefer-ref/prefer-val thing is indeed a bit peculiar. It's an
artifact of the current way of implicit by-reference passing, where
the decision of whether to pass by-value or by-reference has to be
made based on an "educated guess" at the call-site. That leaves us
with always-val, always-ref, prefer-val and prefer-ref as the possible
passing modes. In the explicit by-ref passing regime, the latter two
consolidate, and we have by-val, by-ref and "either" as the options,
which is a lot more obvious.

Thanks, that's a good summary of how this all relates to the RFC. If
there is a use case for such a mode, perhaps we need a way to annotate
userland functions as "either", so that they too can take advantage of
the call-site annotation. In a sense, they'd be opting in to the pre-5.4
behaviour, for that particular parameter.

Instead of having __call(), what we really should have is __get_method().

That is a really interesting idea. The lack of function signature is
currently a big turn-off for using __call, because it means manually
recreating a lot of the unpacking and type checking that the language
would normally do for you.

I believe we talked about this in some detail in the previous
discussion on this topic. My basic stance on in/out is that it's
probably not worth the complexity, unless it is part of an effort to
eliminate references from PHP entirely (which would be hugely
beneficial). Unfortunately I don't really see a clear pathway towards
that. "out" parameters can remove one use-case of references, and I
can see how that would work both in terms of semantics and
implementation. The case of "inout" parameters is much more
problematic. While these can nominally work without references, I
don't see how they could do so efficiently (we would have to "move
out" the value from the original location to avoid COW). Similarly, I
don't have any answer to how &__get() and &offsetGet() would work
without references.

That's fair enough. I guess the reason I've fixated on them is that I'd
really like "out" parameters, independent of what else happens with
references, in order to get the clear signal of "variable is initialised
here" on a call like "preg_match($foo, $bar, out $matches)". That would
make out parameters more attractive to build APIs around, e.g. when you
want multiple strongly typed outputs from one call.

Even if we can't eliminate references entirely, perhaps there's value in
reducing the use cases where they're necessary? A bit like how property
accessor syntax wouldn't allow us to remove __get and __set, but would
mean fewer cases where people needed to deal with them.

Regards,

--
Rowan Tommins (né Collins)
[IMSoP]

5 years ago by tyson andre — view source

unread

Hi internals,

One idea I had that was related to this (but not in the scope of this RFC)
would be adding a way to force the interpreter to treat an argument (variable, array field, property access, etc) as being passed by value,
and refuse to modify it by reference (e.g. emit a notice and create a separate reference (or throw an Error))

i.e. instead of using the opcode SEND_VAR_EX, use a brand new opcode kind SEND_VAR_BY_VALUE that would do that, if the method signature was unknown.

Currently, php emits a notice and creates a temporary reference for non-variables, such as when passing the result of a function returning a non-reference to a reference parameter.
I assume SEND_VAR_BY_VALUE is equivalent to the new opcode needed if a subsequent RFC made call-site pass-by-reference mandatory in a given file.

Possible syntaxes (only within argument lists):

$a->someMethod(*$foo);
$a->someMethod(&&$foo);
$a->someMethod(\$foo);
$a->someMethod(identity($foo));  // add a new keyword such as identity or value
$a->someMethod(=$foo);

Context: From the current thread's RFC https://wiki.php.net/rfc/explicit_send_by_ref

In fact, our inability to determine at compile-time whether a certain argument is passed by-value or by-reference is one of the most significant obstacles in our ability to analyze and optimize compiled code

I suggested this because it would be useful for optimizing frequently used code (after profiling it (e.g. with phpspy) and checking opcache debug output),
especially in frequently called functions/methods.

I'd prefer a declare directive (or edition) to make call-site pass-by-reference mandatory to pass by reference over this suggestion, though.

Tyson