Hi internals!
Welcome to another edition of "crazy PHP edge-cases you don't want to know
about".
I'd like to introduce a restriction that forbids performing dynamic calls
to scope introspection and modification functions. "Dynamic calls" here
means things like $fn(), call_user_func($fn) and array_map($fn, ...).
"Scope introspection functions" refers to the following functions that in
one way or another inspect or modify parent stack frames:
-
extract()
-
compact()
-
get_defined_vars()
-
parse_str()
with one arg -
mb_parse_str()
with one arg -
func_get_args()
-
func_get_arg()
-
func_num_args()
I'd like to introduce this restriction for a number of reasons, which will
be outlined in the following. There are essentially two core problems: A)
It is not clear how these functions should behave in this situation --
indeed I will show examples of behavior differing due to inconsequential
changes, differing between different PHP runtimes or versions and just
generally behaving crazily. B) These calls pose a stability problem (yay
segfaults) and violate assumptions of existing optimizations (yay more
segfaults).
A) The primary issue is that dynamic calls to these functions have unclear
behavior and may lead to some very odd behavior. They all fundamentally
work by inspecting higher stack frames, but don't agree on which frame they
should operate on.
Example #1:
namespace {
function test($a, $b, $c) {
var_dump(call_user_func('func_num_args'));
}
test(1, 2, 3);
}
namespace Foo {
function test($a, $b, $c) {
var_dump(call_user_func('func_num_args'));
}
test(1, 2, 3);
}
This code will print int(3) int(1) on PHP 7 and HHVM (and two times int(1)
on PHP 5). The reason is that in the non-namespaced case the number of
arguments of the test() frame is returned, while in the namespaced case the
number of arguments of the call_user_func()
frame is returned, because of
internal differences in stack frame management.
Example #2:
function test() {
array_map('extract', [['a' => 42]]);
var_dump($a);
}
test();
This will print int(42) on PHP 5+7, but result in an undefined variable on
HHVM. The reason is that HHVM will extract ['a' => 42] into the scope of
the array_map()
frame, rather than the test() frame. It does this because
HHVM implements array_map as a userland (HHAS) function, rather than an
internal function.
One might write this off as a bug in the HHVM implementation, but really
this illustrates a dichotomy between internal and userland functions with
regard to dynamic calls of these functions. Namely, if you were to
reimplement the array_map function in userland
function array_map($fn, $array) {
$result = [];
foreach ($array as $k => $v) {
$result[$k] = $fn($v);
}
return $result;
}
and then try the same snippet as Example #2, it would indeed extract the
array into the scope of array_map()
and not the calling test() function. I
hope this helps to further illustrate why calling these functions
dynamically is a problem: They will generally be executed in a different
scope than the one where the callback is defined. This means you can
actually arbitrarily modify the scope of functions that accept callbacks,
even though they were certainly not designed for this use. E.g. you can
switch the $fn callback in the middle of the array_map execution using
something like:
array_map('extract', [['fn' => ...]]);
Just imagine the possibilities of this newly discovered feature! But this
is only where it starts. PHP has a number of magic callbacks that may be
implicitly executed in all kinds of contexts. For example, what happens if
we use one of these in spl_autoload_register?
Example #3:
spl_autoload_register('parse_str');
function test() {
$FooBar = 1;
class_exists('FooBar');
var_dump($FooBar); // string(0) ""
}
test();
Now any invocation of the autoloader (here using class_exists, but can be
generalized to new or anything else) will create a variable for the class
name in the local scope (with value ""). Of course there's more fun to be
had here (tick functions!), but let's leave it at that and get to the next
point.
B) The second issue is stability. As might be expected, nobody has bothered
testing edge-cases of dynamic calls to these functions previously. Recently
two segfaults relating to this were found, see bug #71220 and bug #72102.
However, these are "just bugs". The more important issue is that these
dynamic calls to scope modifying functions go against assumptions in the
current optimizer. For example the following very simple script currently
crashes if opcache is enabled, because $i is incorrectly determined to be
an integer:
function test() {
$i = 1;
array_map('extract', [['i' => new stdClass]]);
$i += 1;
var_dump($i);
}
test();
This is, of course, a bug in the optimizer and not in PHP. However, if we
try to catch this kind of situation in the optimizer we will have to do
very pessimistic assumptions (especially if you consider things like the
spl_autoload_register example), for a "feature" nobody needs and that
doesn't work particularly well anyway (see A).
Due to all the aforementioned issues, I'd like to forbid these specific
dynamic calls in PHP 7.1. I highly doubt that this will have any BC impact.
A patch is available here: https://github.com/php/php-src/pull/1886
Does anybody see a problem with this change?
Thanks,
Nikita
Hi Nikita,
We already discussed this, and I told that I don't have strong opinion about this change.
From one pint of view, your are going to disable some legal code patterns.
function foo() {
$f = "func_num_args";
$n = $f();
}
array_map("export", [$_GET, $_POST]);
On the other hand, it's better to fix this problem once and forever (in 7.1 release).
I think it's better to keep this email as RFC, at least for documentation of BC break purpose.
I'm not sure if we need 2 week discussion + 2 weeks voting for this.
I think 1 week voting should be enough?
and I'm sure, it should pass.
Thanks. Dmitry.
Hi!
I think it's better to keep this email as RFC, at least for documentation of BC break purpose.
I'm not sure if we need 2 week discussion + 2 weeks voting for this.
I think 1 week voting should be enough?
It should be at least 2 weeks discussion, as it's a language change, as
for voting, depends on when it happens. End of May, for example, has
holidays in the US. But in general if RFC happens soon-ish, there's no
reason to rush.
And, of course, this does require an RFC - this is a substantial
behavioral change that breaks BC (however weird the broken cases are).
Stas Malyshev
smalyshev@gmail.com
I'm all in favour of this. It makes the language much more predictable and
amenable to optimisation and static analysis overall, at the cost of what
IMO is a very, very minor BC break.
To think that all this time, functions I've written which accept a callable
could be passed something like "extract" and depended on whatever behaviour
results. Eugh.
Hi internals!
Welcome to another edition of "crazy PHP edge-cases you don't want to know
about".I'd like to introduce a restriction that forbids performing dynamic calls
to scope introspection and modification functions. "Dynamic calls" here
means things like $fn(), call_user_func($fn) and array_map($fn, ...).
"Scope introspection functions" refers to the following functions that in
one way or another inspect or modify parent stack frames:
extract()
compact()
get_defined_vars()
parse_str()
with one argmb_parse_str()
with one argfunc_get_args()
func_get_arg()
func_num_args()
I'd like to introduce this restriction for a number of reasons, which will
be outlined in the following. There are essentially two core problems: A)
It is not clear how these functions should behave in this situation --
indeed I will show examples of behavior differing due to inconsequential
changes, differing between different PHP runtimes or versions and just
generally behaving crazily. B) These calls pose a stability problem (yay
segfaults) and violate assumptions of existing optimizations (yay more
segfaults).A) The primary issue is that dynamic calls to these functions have unclear
behavior and may lead to some very odd behavior. They all fundamentally
work by inspecting higher stack frames, but don't agree on which frame they
should operate on.Example #1:
namespace {
function test($a, $b, $c) {
var_dump(call_user_func('func_num_args'));
}
test(1, 2, 3);
}namespace Foo {
function test($a, $b, $c) {
var_dump(call_user_func('func_num_args'));
}
test(1, 2, 3);
}This code will print int(3) int(1) on PHP 7 and HHVM (and two times int(1)
on PHP 5). The reason is that in the non-namespaced case the number of
arguments of the test() frame is returned, while in the namespaced case the
number of arguments of thecall_user_func()
frame is returned, because of
internal differences in stack frame management.Example #2:
function test() {
array_map('extract', [['a' => 42]]);
var_dump($a);
}
test();This will print int(42) on PHP 5+7, but result in an undefined variable on
HHVM. The reason is that HHVM will extract ['a' => 42] into the scope of
thearray_map()
frame, rather than the test() frame. It does this because
HHVM implements array_map as a userland (HHAS) function, rather than an
internal function.One might write this off as a bug in the HHVM implementation, but really
this illustrates a dichotomy between internal and userland functions with
regard to dynamic calls of these functions. Namely, if you were to
reimplement the array_map function in userlandfunction array_map($fn, $array) {
$result = [];
foreach ($array as $k => $v) {
$result[$k] = $fn($v);
}
return $result;
}and then try the same snippet as Example #2, it would indeed extract the
array into the scope ofarray_map()
and not the calling test() function. I
hope this helps to further illustrate why calling these functions
dynamically is a problem: They will generally be executed in a different
scope than the one where the callback is defined. This means you can
actually arbitrarily modify the scope of functions that accept callbacks,
even though they were certainly not designed for this use. E.g. you can
switch the $fn callback in the middle of the array_map execution using
something like:array_map('extract', [['fn' => ...]]);
Just imagine the possibilities of this newly discovered feature! But this
is only where it starts. PHP has a number of magic callbacks that may be
implicitly executed in all kinds of contexts. For example, what happens if
we use one of these in spl_autoload_register?Example #3:
spl_autoload_register('parse_str');
function test() {
$FooBar = 1;
class_exists('FooBar');
var_dump($FooBar); // string(0) ""
}
test();Now any invocation of the autoloader (here using class_exists, but can be
generalized to new or anything else) will create a variable for the class
name in the local scope (with value ""). Of course there's more fun to be
had here (tick functions!), but let's leave it at that and get to the next
point.B) The second issue is stability. As might be expected, nobody has bothered
testing edge-cases of dynamic calls to these functions previously. Recently
two segfaults relating to this were found, see bug #71220 and bug #72102.
However, these are "just bugs". The more important issue is that these
dynamic calls to scope modifying functions go against assumptions in the
current optimizer. For example the following very simple script currently
crashes if opcache is enabled, because $i is incorrectly determined to be
an integer:function test() {
$i = 1;
array_map('extract', [['i' => new stdClass]]);
$i += 1;
var_dump($i);
}
test();This is, of course, a bug in the optimizer and not in PHP. However, if we
try to catch this kind of situation in the optimizer we will have to do
very pessimistic assumptions (especially if you consider things like the
spl_autoload_register example), for a "feature" nobody needs and that
doesn't work particularly well anyway (see A).Due to all the aforementioned issues, I'd like to forbid these specific
dynamic calls in PHP 7.1. I highly doubt that this will have any BC impact.
A patch is available here: https://github.com/php/php-src/pull/1886Does anybody see a problem with this change?
Thanks,
Nikita
Am 29.04.2016 um 11:48 schrieb Nikita Popov:
Welcome to another edition of "crazy PHP edge-cases you don't want
to know about".
I love and hate these edge cases at the same time :)
I'd like to introduce a restriction that forbids performing dynamic calls
to scope introspection and modification functions.
+1