Hi everyone
As you probably know, a common performance optimization in PHP is to
prefix global function calls in namespaced code with a \
. In
namespaced code, relative function calls (meaning, not prefixed with
\
, not imported and not containing multiple namespace components)
will be looked up in the current namespace before falling back to the
global namespace. Prefixing the function name with \
disambiguates
the called function by always picking the global function.
Not knowing exactly which function is called at compile time has a
couple of downsides to this:
- It leads to the aforementioned double-lookup.
- It prevents compile-time-evaluation of pure internal functions.
- It prevents compiling to specialized opcodes for specialized
internal functions (e.g.strlen()
). - It requires branching for frameless functions [1].
- It prevents an optimization that looks up internal functions by
offset rather than by name [2]. - It prevents compiling to more specialized argument sending opcodes
because of unknown by-value/by-reference passing.
All of these are enabled by disambiguating the call. Unfortunately,
prefixing all calls with \
, or adding a use function
at the top of
every file is annoying and noisy. We recently got a feature request to
change how functions are looked up [3]. The approach that appears to
cause the smallest backwards incompatibility is to flip the order in
which functions are looked up: Check in global scope first, and only
then in local scope. With this approach, if we can find a global
function at compile-time, we know this is the function that will be
picked at run-time, hence automatically enabling the optimizations
above. I created a PoC implementing this approach [4].
Máté has kindly benchmarked the patch, measuring an improvement of
~3.9% for Laravel, and ~2.1% for Symfony
(https://gist.github.com/kocsismate/75be09bf6011630ebd40a478682d6c17).
This seems quite significant, given that no changes were required in
either of these two codebases.
There are a few noteworthy downsides:
- Unqualified calls to functions in the same namespace would be
slightly slower, because they now involve checking global scope first.
I believe that unqualified, global calls are much more common, so this
change should still result in a net positive. It's also possible to
avoid this cost by adding ause function
to the top of the file. - Introducing new functions in the global namespace could cause a BC
break for unqualified calls, if the function happens to have the same
name. This is unfortunate, but likely rare. Since new functions are
only introduced in minor/major versions, this should be manageable,
but must be considered for every PHP upgrade. - Some mocking libraries (e.g. Symfony's ClockMock [5]) intentionally
declare functions called from some file in the files namespace to
intercept these calls. This use-case would break. That said, it is
somewhat of a fragile approach to begin with, given that it wouldn't
work for fully qualified calls, or unnamespaced code.
I performed a small impact analysis [6]. There are 484 namespaced
functions shadowing global, internal functions in the top 1000
composer packages. However, the vast majority (464) of these functions
come from thecodingmachine/safe, whose entire purpose is offering
safer wrappers around internal functions. Excluding this library,
there are only 20 shadowing functions, which is surprisingly little.
Furthermore, the patch would have no impact on users of
thecodingmachine/safe, only on the library code itself.
As for providing a migration path: One approach might be to introduce
an INI setting that performs the function lookup in both local and
global scope at run-time, and informs the user about the behavioral
change in the future. To mitigate it, an explicit use function
would
need to be added to the top of the file, or the call would need to be
prefixed with namespace\
. The impact analysis [6] also provides a
script that looks for shadowing functions in your project. It does not
identify uses of these functions (yet), just their declarations.
Lastly, I've already raised this idea in the PHP Foundations internal
chat but did not receive much positive feedback, mostly due to fear of
the potential BC impact. I'm not particularly convinced this is an
issue, given the impact analysis. Given the surprisingly large
performance benefits, I was inclined to raise it here anyway. It also
sparked some related ideas, like providing modules that lock
namespaces and optimize multiple files as a singular unit. That said,
such approaches would likely be significantly more complex than the
approach proposed here (~30 lines of C code).
Anyway, please let me know about possible concerns, broken use-cases,
or any alternative approaches that may come to mind. I'm looking
forward to your feedback.
Ilija
[1] https://github.com/php/php-src/pull/12461
[2] https://github.com/php/php-src/pull/13634
[3] https://github.com/php/php-src/issues/13632
[4] https://github.com/php/php-src/pull/14529
[5] https://github.com/symfony/symfony/blob/7.1/src/Symfony/Bridge/PhpUnit/ClockMock.php
[6] https://gist.github.com/iluuu1994/4b83481baac563f8f0d3204c697c5551
Hi everyone
As you probably know, a common performance optimization in PHP is to
prefix global function calls in namespaced code with a\
. In
namespaced code, relative function calls (meaning, not prefixed with
\
, not imported and not containing multiple namespace components)
will be looked up in the current namespace before falling back to the
global namespace. Prefixing the function name with\
disambiguates
the called function by always picking the global function.Not knowing exactly which function is called at compile time has a
couple of downsides to this:
- It leads to the aforementioned double-lookup.
- It prevents compile-time-evaluation of pure internal functions.
- It prevents compiling to specialized opcodes for specialized
internal functions (e.g.strlen()
).- It requires branching for frameless functions [1].
- It prevents an optimization that looks up internal functions by
offset rather than by name [2].- It prevents compiling to more specialized argument sending opcodes
because of unknown by-value/by-reference passing.All of these are enabled by disambiguating the call. Unfortunately,
prefixing all calls with\
, or adding ause function
at the top of
every file is annoying and noisy. We recently got a feature request to
change how functions are looked up [3]. The approach that appears to
cause the smallest backwards incompatibility is to flip the order in
which functions are looked up: Check in global scope first, and only
then in local scope. With this approach, if we can find a global
function at compile-time, we know this is the function that will be
picked at run-time, hence automatically enabling the optimizations
above. I created a PoC implementing this approach [4].Máté has kindly benchmarked the patch, measuring an improvement of
~3.9% for Laravel, and ~2.1% for Symfony
(https://gist.github.com/kocsismate/75be09bf6011630ebd40a478682d6c17).
This seems quite significant, given that no changes were required in
either of these two codebases.
So, what you’re saying is that symfony and laravel can get a performance increase by simply adding a \ in the right places? Why don’t they do that instead of changing the language?
There are a few noteworthy downsides:
- Unqualified calls to functions in the same namespace would be
slightly slower, because they now involve checking global scope first.
I believe that unqualified, global calls are much more common, so this
change should still result in a net positive. It's also possible to
avoid this cost by adding ause function
to the top of the file.
For functions/classes in the same exact namespace, you don’t need a use statement. But after this change, you do in certain cases?
namespace Foo;
function array_sum($bar) {}
function baz($bar) {
return array_sum($bar);
}
So, how do you use that function in the same file?
- Introducing new functions in the global namespace could cause a BC
break for unqualified calls, if the function happens to have the same
name. This is unfortunate, but likely rare. Since new functions are
only introduced in minor/major versions, this should be manageable,
but must be considered for every PHP upgrade.
We can only see open source code when doing impact analysis. This means picking even a slightly “popular” name could go very poorly.
- Some mocking libraries (e.g. Symfony's ClockMock [5]) intentionally
declare functions called from some file in the files namespace to
intercept these calls. This use-case would break. That said, it is
somewhat of a fragile approach to begin with, given that it wouldn't
work for fully qualified calls, or unnamespaced code.
See above. I’ve seen this “trick” used on many closed source projects. I’ve also seen it used when PHP has a bug and the workaround is to implement it in php like this.
I performed a small impact analysis [6]. There are 484 namespaced
functions shadowing global, internal functions in the top 1000
composer packages. However, the vast majority (464) of these functions
come from thecodingmachine/safe, whose entire purpose is offering
safer wrappers around internal functions. Excluding this library,
there are only 20 shadowing functions, which is surprisingly little.
Furthermore, the patch would have no impact on users of
thecodingmachine/safe, only on the library code itself.As for providing a migration path: One approach might be to introduce
an INI setting that performs the function lookup in both local and
global scope at run-time, and informs the user about the behavioral
change in the future. To mitigate it, an explicituse function
would
need to be added to the top of the file, or the call would need to be
prefixed withnamespace\
. The impact analysis [6] also provides a
script that looks for shadowing functions in your project. It does not
identify uses of these functions (yet), just their declarations.Lastly, I've already raised this idea in the PHP Foundations internal
chat but did not receive much positive feedback, mostly due to fear of
the potential BC impact. I'm not particularly convinced this is an
issue, given the impact analysis. Given the surprisingly large
performance benefits, I was inclined to raise it here anyway. It also
sparked some related ideas, like providing modules that lock
namespaces and optimize multiple files as a singular unit. That said,
such approaches would likely be significantly more complex than the
approach proposed here (~30 lines of C code).Anyway, please let me know about possible concerns, broken use-cases,
or any alternative approaches that may come to mind. I'm looking
forward to your feedback.Ilija
[1] https://github.com/php/php-src/pull/12461
[2] https://github.com/php/php-src/pull/13634
[3] https://github.com/php/php-src/issues/13632
[4] https://github.com/php/php-src/pull/14529
[5] https://github.com/symfony/symfony/blob/7.1/src/Symfony/Bridge/PhpUnit/ClockMock.php
[6] https://gist.github.com/iluuu1994/4b83481baac563f8f0d3204c697c5551
— Rob
Hi Rob
So, what you’re saying is that symfony and laravel can get a performance increase by simply adding a \ in the right places? Why don’t they do that instead of changing the language?
Nothing, of course. However, a Symfony maintainer has expressed
uninterest in prefixing all internal function calls, including
automated use statements at the top of the file. Even if they did,
most users will not.
For functions/classes in the same exact namespace, you don’t need a use statement. But after this change, you do in certain cases?
namespace Foo;
function array_sum($bar) {}
function baz($bar) {
return array_sum($bar);
}So, how do you use that function in the same file?
Yes. But I'm not sure how that's different from today? If there's a
local and global function declared with the same name, and you intend
to call the global one, you'll already need to disambiguate the call
with a .
With this change, your two options would be to:
- Prefix your calls with namespace. That's quite ugly, but is the
syntax we currently offer. - Add a
use array_sum;
to the top of the file.
An explicit use has upsides too. It makes it much more obvious that
the global function is shadowed.
We can only see open source code when doing impact analysis. This means picking even a slightly “popular” name could go very poorly.
Yes, and there are many more than 10 000 composer repositories. An
impact analysis can give you an approximation for breakage, not
absolute numbers.
Ilija
Hi Rob
So, what you’re saying is that symfony and laravel can get a performance
increase by simply adding a \ in the right places? Why don’t they do that
instead of changing the language?Nothing, of course. However, a Symfony maintainer has expressed
uninterest in prefixing all internal function calls, including
automated use statements at the top of the file. Even if they did,
most users will not.
Function lookup in either global or local scope is problematic, and
probably that's why we don't have autoloading for functions yet.
How about we change the language so that in PHP 9.0 there will be a notice
that gets triggered when a fallback to the global namespace gets triggered.
We would upgrade that to a warning in PHP 9.2, and it would end up being an
error on PHP 10 and have a BC break.
I don't think adding a \ to each function call is ugly, that's what we have
for classes, and it works fine; or an use statement.
So, why do we think that after people get used to it, they would still
consider it ugly? Never heard the "ugliness" mentioned for classes.
Now, I know this would be a big BC break, but it brings consistency to the
language and forces everyone to improve their code performance.
If that's not acceptable, then maybe considering all unqualified functions
as belonging to global namespace only might be an alternative solution.
That means that non-global functions must be imported or be fully
qualified, and that would be a BC break as well, but a smaller one.
To sum up, I think we need to remove the fallback behavior, so we can have
better things in the future.
Either keep only local with a bigger BC break but a better language
consistency.
Or keep only global with a smaller BC break.
Regards,
Alex
We would upgrade that to a warning in PHP 9.2, and it would end up
being an error on PHP 10 and have a BC break.I don't think adding a \ to each function call is ugly, that's what
we have for classes, and it works fine; or an use statement.So, why do we think that after people get used to it, they would
still consider it ugly? Never heard the "ugliness" mentioned for
classes.
Respectfully, I think \
is ugly for both functions and classes.
Now, I know this would be a big BC break, but it brings consistency
to the language and forces everyone to improve their code
performance.
There should be a directive for this, like:
namespace foo using global functions;
...which automatically acts as if all functions have a \ in front of
them, unless they are fully qualified.
We would upgrade that to a warning in PHP 9.2, and it would end up
being an error on PHP 10 and have a BC break.I don't think adding a \ to each function call is ugly, that's what
we have for classes, and it works fine; or an use statement.So, why do we think that after people get used to it, they would
still consider it ugly? Never heard the "ugliness" mentioned for
classes.Respectfully, I think
\
is ugly for both functions and classes.Now, I know this would be a big BC break, but it brings consistency
to the language and forces everyone to improve their code
performance.There should be a directive for this, like:
namespace foo using global functions;
...which automatically acts as if all functions have a \ in front of
them, unless they are fully qualified.
Respectfully, I feel like this gets into the heart of a problem with RFCs, where if someone wants to implement something, they have to solve everyone’s problems.
In this case, there is a problem with performance issues due to multiple lookups (though, I’m not convinced fully), so if someone wants to implement function autoloading, they also have to solve this problem (Gina and I have both independently solved it in various ways).
Personally, I’m of the opinion that if you want performance, you know what to do: fully qualify your names. If you don’t care (which is what I gather from the first email in this thread where maintainers were not willing to change their code), then “deal with it.”
The vast majority of performance issues won’t be caused by function lookups, but by databases and poorly written code. Maybe I am wrong, but I rather like what we currently have, whatever benchmarks have to say on the matter.
— Rob
To sum up, I think we need to remove the fallback behavior, so we can have better things in the future.
Either keep only local with a bigger BC break but a better language consistency.
Or keep only global with a smaller BC break.
I have long been in favor of a larger BC break with better language
consistency. Class lookup and function lookup with respect to
namespaces should be treated the same. The difficulty is getting a
majority of people to vote yes for this. Keep in mind that qualifying
every global function is annoying but probably can be somewhat
automated, and will bring better performance. So again, this improves
the existing code even without upgrading.
Yes, there would be complaints about it. Yes, there are probably some
people or projects who wouldn't upgrade. I don't particularly care, as
there are increasingly more operating systems and companies providing
LTS support for long periods of time. Probably Zend.com will offer LTS
support for the last PHP 8.X release, and possibly there will be some
distro which also has it. I believe it's the right thing to do
because:
- It's faster.
- It enables function autoloading in a similar manner to class autoloading.
- It's more consistent, and simpler to teach and maintain.
It's rare that you get all of these together, often you have to make
tradeoffs within them.
Hi Levi
On Tue, Aug 20, 2024 at 5:14 PM Levi Morrison
levi.morrison@datadoghq.com wrote:
I have long been in favor of a larger BC break with better language
consistency. Class lookup and function lookup with respect to
namespaces should be treated the same. The difficulty is getting a
majority of people to vote yes for this. Keep in mind that qualifying
every global function is annoying but probably can be somewhat
automated, and will bring better performance. So again, this improves
the existing code even without upgrading.Yes, there would be complaints about it. Yes, there are probably some
people or projects who wouldn't upgrade. I don't particularly care, as
there are increasingly more operating systems and companies providing
LTS support for long periods of time. Probably Zend.com will offer LTS
support for the last PHP 8.X release, and possibly there will be some
distro which also has it. I believe it's the right thing to do
because:
- It's faster.
- It enables function autoloading in a similar manner to class autoloading.
- It's more consistent, and simpler to teach and maintain.
It's rare that you get all of these together, often you have to make
tradeoffs within them.
The approach I originally proposed also solves 1. and 2. (mostly) with
very little backwards incompatibility. Consistency is absolutely
something to strive for, but not at the cost of breaking most PHP
code.
To clarify on 2.: The main issue with function autoloading today is
that the engine needs to trigger the autoloader for every unqualified
call to global functions, given that the autoloader might declare the
function in local scope. As most unqualified calls are global calls,
this adds a huge amount of overhead.
Gina solved this in part by aliasing the local function to the global
one after the first lookup. However, that still means that the
autoloader will trigger for every new namespace the function is called
in, and will also pollute the function table.
Reversing the lookup order once again avoids local lookup when calling
global functions in local scope, which also means dodging the
autoloader. The caveat is that calling local functions in local scope
triggers the autoloader on first encounter, but at least it can be
marked as undeclared in the symbol table once, instead of in every
namespace, which also means triggering the autoloader only once.
Ilija
On Tue, Aug 20, 2024 at 11:34 PM Ilija Tovilo tovilo.ilija@gmail.com
wrote:
Hi Levi
On Tue, Aug 20, 2024 at 5:14 PM Levi Morrison
levi.morrison@datadoghq.com wrote:I have long been in favor of a larger BC break with better language
consistency. Class lookup and function lookup with respect to
namespaces should be treated the same. The difficulty is getting a
majority of people to vote yes for this. Keep in mind that qualifying
every global function is annoying but probably can be somewhat
automated, and will bring better performance. So again, this improves
the existing code even without upgrading.Yes, there would be complaints about it. Yes, there are probably some
people or projects who wouldn't upgrade. I don't particularly care, as
there are increasingly more operating systems and companies providing
LTS support for long periods of time. Probably Zend.com will offer LTS
support for the last PHP 8.X release, and possibly there will be some
distro which also has it. I believe it's the right thing to do
because:
- It's faster.
- It enables function autoloading in a similar manner to class
autoloading.- It's more consistent, and simpler to teach and maintain.
It's rare that you get all of these together, often you have to make
tradeoffs within them.The approach I originally proposed also solves 1. and 2. (mostly) with
very little backwards incompatibility. Consistency is absolutely
something to strive for, but not at the cost of breaking most PHP
code.To clarify on 2.: The main issue with function autoloading today is
that the engine needs to trigger the autoloader for every unqualified
call to global functions, given that the autoloader might declare the
function in local scope. As most unqualified calls are global calls,
this adds a huge amount of overhead.Gina solved this in part by aliasing the local function to the global
one after the first lookup. However, that still means that the
autoloader will trigger for every new namespace the function is called
in, and will also pollute the function table.Reversing the lookup order once again avoids local lookup when calling
global functions in local scope, which also means dodging the
autoloader. The caveat is that calling local functions in local scope
triggers the autoloader on first encounter, but at least it can be
marked as undeclared in the symbol table once, instead of in every
namespace, which also means triggering the autoloader only once.Ilija
Hi,
I completely agree with Levi's perspective, aligning class and function
lookup with respect
to namespaces seems a very sensible option.
It will improve consistency and pave the road for autoloading functions
without quirks.
The impact of fixing functions look up is overstated. For instance,
PHP-CS-Fixer can add
"global namespace qualifiers" to all global functions in a matter of
minutes, it is not like
people have to go through code and change it manually.
To ease the transition, PHP can ship a small fixer with the next PHP
version for changing
global function usage (prepending \ or adding use statements) and be done
with the
inconsistency once and for all.
Kind regards,
Faizan
Hi Levi
On Tue, Aug 20, 2024 at 5:14 PM Levi Morrison
levi.morrison@datadoghq.com wrote:I have long been in favor of a larger BC break with better language
consistency. Class lookup and function lookup with respect to
namespaces should be treated the same. The difficulty is getting a
majority of people to vote yes for this. Keep in mind that qualifying
every global function is annoying but probably can be somewhat
automated, and will bring better performance. So again, this improves
the existing code even without upgrading.Yes, there would be complaints about it. Yes, there are probably some
people or projects who wouldn't upgrade. I don't particularly care, as
there are increasingly more operating systems and companies providing
LTS support for long periods of time. Probably Zend.com will offer LTS
support for the last PHP 8.X release, and possibly there will be some
distro which also has it. I believe it's the right thing to do
because:
- It's faster.
- It enables function autoloading in a similar manner to class autoloading.
- It's more consistent, and simpler to teach and maintain.
It's rare that you get all of these together, often you have to make
tradeoffs within them.The approach I originally proposed also solves 1. and 2. (mostly) with
very little backwards incompatibility. Consistency is absolutely
something to strive for, but not at the cost of breaking most PHP
code.To clarify on 2.: The main issue with function autoloading today is
that the engine needs to trigger the autoloader for every unqualified
call to global functions, given that the autoloader might declare the
function in local scope. As most unqualified calls are global calls,
this adds a huge amount of overhead.Gina solved this in part by aliasing the local function to the global
one after the first lookup. However, that still means that the
autoloader will trigger for every new namespace the function is called
in, and will also pollute the function table.Reversing the lookup order once again avoids local lookup when calling
global functions in local scope, which also means dodging the
autoloader. The caveat is that calling local functions in local scope
triggers the autoloader on first encounter, but at least it can be
marked as undeclared in the symbol table once, instead of in every
namespace, which also means triggering the autoloader only once.Ilija
Hi,
I completely agree with Levi's perspective, aligning class and function lookup with respect
to namespaces seems a very sensible option.
It will improve consistency and pave the road for autoloading functions without quirks.The impact of fixing functions look up is overstated. For instance, PHP-CS-Fixer can add
"global namespace qualifiers" to all global functions in a matter of minutes, it is not like
people have to go through code and change it manually.To ease the transition, PHP can ship a small fixer with the next PHP version for changing
global function usage (prepending \ or adding use statements) and be done with the
inconsistency once and for all.Kind regards,
Faizan
I am currently working on benchmarks specifically related to my function autoloading RFC, and I'm (not yet) certain there will be any performance impacts related to function autoloading. I may end up eating my hat here, but in any case, there is only speculation at this point.
If this change improves performance; that's great. However, I don't think we should be changing things just for the sake of performance though (or the opposite). It's great to be aware of how things affect performance, but I don't think we should make decisions purely based on it; otherwise we will never add any new features to PHP.
— Rob
Am 20.08.2024 um 17:14 schrieb Levi Morrison levi.morrison@datadoghq.com:
Keep in mind that qualifying
every global function is annoying but probably can be somewhat
automated, and will bring better performance. So again, this improves
the existing code even without upgrading.
Just to be sure: Would code not using namespaces also have to qualify global function calls? I admit that I somewhat skimmed the discussion so I might have missed that point.
The point where I think we disagree is that it improves the code. It may improve performance of the code (even though I somewhat doubt this has a significant impact on most projects) but it IMHO hurts readability. Writing the additional \ is less of a problem but as code is read a lot more often than written I think the additional "line-noise" is something I'd like to avoid.
Regards,
- Chris
On Tue, Aug 20, 2024 at 8:26 PM Christian Schneider
cschneid@cschneid.com wrote:
Am 20.08.2024 um 17:14 schrieb Levi Morrison levi.morrison@datadoghq.com:
Keep in mind that qualifying
every global function is annoying but probably can be somewhat
automated, and will bring better performance. So again, this improves
the existing code even without upgrading.Just to be sure: Would code not using namespaces also have to qualify global function calls? I admit that I somewhat skimmed the discussion so I might have missed that point.
Code that isn't in a namespace is in the global namespace. So no, such
code does not have to qualify the global function calls.
On Wed, Aug 21, 2024, 9:34 AM Christian Schneider cschneid@cschneid.com
wrote:
Am 20.08.2024 um 17:14 schrieb Levi Morrison <levi.morrison@datadoghq.com
:
Keep in mind that qualifying
every global function is annoying but probably can be somewhat
automated, and will bring better performance. So again, this improves
the existing code even without upgrading.Just to be sure: Would code not using namespaces also have to qualify
global function calls? I admit that I somewhat skimmed the discussion so I
might have missed that point.The point where I think we disagree is that it improves the code. It may
improve performance of the code (even though I somewhat doubt this has a
significant impact on most projects) but it IMHO hurts readability.
Writing the additional \ is less of a problem but as code is read a lot
more often than written I think the additional "line-noise" is something
I'd like to avoid.Regards,
- Chris
Hi Chris,
You don't have to write additional ,
you can add "use function" statements
if you prefer that style.
It's no different from referencing global
classes, they either need to be prefixed
with \ or need to have a corresponding "use" statement.
Kind regards,
Faizan
Am 21.08.2024 um 09:44 schrieb Faizan Akram Dar hello@faizanakram.me:
The point where I think we disagree is that it improves the code. It may improve performance of the code (even though I somewhat doubt this has a significant impact on most projects) but it IMHO hurts readability. Writing the additional \ is less of a problem but as code is read a lot more often than written I think the additional "line-noise" is something I'd like to avoid.
You don't have to write additional ,
you can add "use function" statements
if you prefer that style.
I think that is trading one problem for another:
- Having to declare all global functions like strlen with 'use' is (IMHO) unnecessary boilerplate which also needs to be kept in sync with the rest of the code below
- I am generally wary of top declarations changing "semantics" of code further down the line. Being able to tell what is being done without (far away) context is a feature and that's why I e.g. prefer foo($GLOBALS['bar']) to global $bar; ... foo($bar).
Regards,
- Chris
- Prefix your calls with namespace. That's quite ugly, but is the
syntax we currently offer.
I was thinking about this earlier, and how the migration is pretty much
the same (and equally automatable) in either direction:
- If unqualified calls become always local, then every global function
call needs a use statement or prefixing with "". - If they become always global, then every local function call needs a
use statement or prefixing with "namespace".
But the first option probably requires changes in the majority of PHP
files in use anywhere; whereas the second only affects a small minority
of code bases, and a small minority of code in those.
BUT, if people already complain about "" being ugly, having to write
"namespace" is going to make them REALLY grumpy...
So maybe at the same time (or, probably, in advance) we need to come up
with a nicer syntax for explicitly referencing the current namespace.
Unfortunately, finding unused syntax is hard, which is why we have ""
in the first place (and for the record, I think it works just fine), but
maybe something like "_" could work? Giving us:
namespace Foo;
$native_length = strlen('hello'); # same as \strlen('hello')
$foo_length = _\strlen('hello'); # same as \Foo\strlen('hello')
If I had a time machine, I'd campaign for "unqualified means local" in
PHP 5.3, and we'd all be used to writing "\strlen" by now; but
"unqualified means global" feels much more achievable from where we are.
--
Rowan Tommins
[IMSoP]
I was thinking about this earlier, and how the migration is pretty much the same (and equally automatable) in either direction:
- If unqualified calls become always local, then every global function call needs a use statement or prefixing with "".
- If they become always global, then every local function call needs a use statement or prefixing with "namespace".
But the first option probably requires changes in the majority of PHP files in use anywhere; whereas the second only affects a small minority of code bases, and a small minority of code in those.
BUT, if people already complain about "" being ugly, having to write "namespace" is going to make them REALLY grumpy...
So maybe at the same time (or, probably, in advance) we need to come up with a nicer syntax for explicitly referencing the current namespace.
Unfortunately, finding unused syntax is hard, which is why we have "" in the first place (and for the record, I think it works just fine), but maybe something like "_" could work? Giving us:
namespace Foo;
$native_length = strlen('hello'); # same as \strlen('hello')
$foo_length = _\strlen('hello'); # same as \Foo\strlen('hello')
If having to type \strlen()
is ugly — and I agree that is it — then having to type _\strlen()
is what in university we would call "fugly," to emphasize just how much worse something was vs. just run-of-the-mill "ugly."
Having to prefix with a name like Foo, e.g. Foo\strlen() is FAR PREFERABLE to _\strlen() because at least it provides satiating information rather than the empty calories of a cryptic shorthand. #jmtcw, anyway.
If I had a time machine, I'd campaign for "unqualified means local" in PHP 5.3, and we'd all be used to writing "\strlen" by now; but "unqualified means global" feels much more achievable from where we are.
If I had a time machine I would campaign for real packages instead of what namespaces turned out to me, and that used sigils that do not double as the escape character for strings, but then both of us digress.
-Mike
Having to prefix with a name like Foo, e.g. Foo\strlen() is FAR PREFERABLE to _\strlen() because at least it provides satiating information rather than the empty calories of a cryptic shorthand. #jmtcw, anyway.
I knew I'd regret keeping the example short. Realistically, it's not a substitute for "\Foo\strlen", it's a substitute for "\AcmeComponents\SplineReticulator\Utilities\Text\strlen".
Having a syntax for "relative to current" is incredibly common in other path-like syntaxes. The most common marker is ".", and ".\foo" is literally how you'd refer to something in the current directory under DOS/Windows. But unfortunately, we don't have "." available, so I wondered if "_" would feel similar enough.
Another option would be to find a shorter keyword than "namespace" to put it in front. "ns\strlen(...)" is an obvious step from what we have currently, but it's not very obvious what it means, so maybe there's a different word we could use.
Rowan Tommins
[IMSoP]
On 23 August 2024 00:15:19 BST, Mike Schinkel mike@newclarity.net
wrote:Having to prefix with a name like Foo, e.g. Foo\strlen() is FAR
PREFERABLE to _\strlen() because at least it provides satiating
information rather than the empty calories of a cryptic shorthand.
#jmtcw, anyway.I knew I'd regret keeping the example short. Realistically, it's not
a substitute for "\Foo\strlen", it's a substitute for
"\AcmeComponents\SplineReticulator\Utilities\Text\strlen".Having a syntax for "relative to current" is incredibly common in
other path-like syntaxes. The most common marker is ".", and ".\foo"
is literally how you'd refer to something in the current directory
under DOS/Windows. But unfortunately, we don't have "." available, so
I wondered if "_" would feel similar enough.Another option would be to find a shorter keyword than "namespace" to
put it in front. "ns\strlen(...)" is an obvious step from what we
have currently, but it's not very obvious what it means, so maybe
there's a different word we could use.Rowan Tommins
[IMSoP]
Could be mistaken, but I think the way PHP handles namespaces
internally is sort of the same as a long string, rather than as a
tree/hierarchy.
ie. \AcmeComponents\SplineReticulator\Utilities\Text\strlen
is really like:
class AcmeComponentsSplineReticulatorUtilitiesTextstrlen {
public function __construct(){
}
}
And the "AcmeComponentsSplineReticulatorUtilitiesText" just kind of
gets appended to the front when the class name is registered.
I haven't done work on the namespace code, but I recall reading this
somewhere recently.
On 23 August 2024 00:15:19 BST, Mike Schinkel mike@newclarity.net
wrote:Having to prefix with a name like Foo, e.g. Foo\strlen() is FAR
PREFERABLE to _\strlen() because at least it provides satiating
information rather than the empty calories of a cryptic shorthand.
#jmtcw, anyway.I knew I'd regret keeping the example short. Realistically, it's not
a substitute for "\Foo\strlen", it's a substitute for
"\AcmeComponents\SplineReticulator\Utilities\Text\strlen".Having a syntax for "relative to current" is incredibly common in
other path-like syntaxes. The most common marker is ".", and ".\foo"
is literally how you'd refer to something in the current directory
under DOS/Windows. But unfortunately, we don't have "." available, so
I wondered if "_" would feel similar enough.Another option would be to find a shorter keyword than "namespace" to
put it in front. "ns\strlen(...)" is an obvious step from what we
have currently, but it's not very obvious what it means, so maybe
there's a different word we could use.Rowan Tommins
[IMSoP]Could be mistaken, but I think the way PHP handles namespaces
internally is sort of the same as a long string, rather than as a
tree/hierarchy.ie. \AcmeComponents\SplineReticulator\Utilities\Text\strlen
is really like:
class AcmeComponentsSplineReticulatorUtilitiesTextstrlen {
public function __construct(){
}
}
And the "AcmeComponentsSplineReticulatorUtilitiesText" just kind of
gets appended to the front when the class name is registered.I haven't done work on the namespace code, but I recall reading this
somewhere recently.
This is mostly correct, the only thing missing from your strings is the \
character. I believe this even happens during compilation. Meaning it sees your namespace/uses and then rewrites the function/class calls during compile time. Thus an unqualified call is prepended with the current namespace defined at the top of the file.
If we were to go with any major change in the current lookup where it is perf or nothing, this is what I would propose for php 9.0 (starting with an immediate deprecation):
- any unqualified call simply calls the current namespace
-
= php 9.0: no fallback to global
- < php 9.0: emit deprecation notice if falls back to global
This is how classes work (pretty sure), so it would be consistent.
Going the other way (global first) doesn't really make sense because it is inconsistent, IMHO. Will it suck? Probably. Will it be easy to fix? Probably via Rector.
— Rob
If we were to go with any major change in the current lookup where it
is perf or nothing, this is what I would propose for php 9.0
(starting with an immediate deprecation):
1. any unqualified call simply calls the current namespace
2. >= php 9.0: no fallback to global
3. < php 9.0: emit deprecation notice if falls back to global
This is how classes work (pretty sure), so it would be consistent.Going the other way (global first) doesn't really make sense because
it is inconsistent, IMHO. Will it suck? Probably. Will it be easy to
fix? Probably via Rector.— Rob
A third option, which I haven't seen come up on the list yet, is that
unqualified functions that are PHP built-ins are treated as global, and
using a function having the same name as a built-in, in a namespace
scope, requires a fully qualified name to override the built-in.
It seems that if someone is writing array_key_exists()
or similar
they probably mean the built-in function, and in the rare cases where
they do mean \foo\array_key_exists()
, they can write it explicitly.
Functions that are not on the built-in function list could default to
the local namespace.
If we were to go with any major change in the current lookup where it
is perf or nothing, this is what I would propose for php 9.0
(starting with an immediate deprecation):
- any unqualified call simply calls the current namespace
= php 9.0: no fallback to global
- < php 9.0: emit deprecation notice if falls back to global
This is how classes work (pretty sure), so it would be consistent.Going the other way (global first) doesn't really make sense because
it is inconsistent, IMHO. Will it suck? Probably. Will it be easy to
fix? Probably via Rector.— Rob
A third option, which I haven't seen come up on the list yet, is that
unqualified functions that are PHP built-ins are treated as global, and
using a function having the same name as a built-in, in a namespace
scope, requires a fully qualified name to override the built-in.It seems that if someone is writing
array_key_exists()
or similar
they probably mean the built-in function, and in the rare cases where
they do mean\foo\array_key_exists()
, they can write it explicitly.Functions that are not on the built-in function list could default to
the local namespace.
I was actually thinking of doing something like this for function autoloading, where extensions could register global functions that bypass the autoloader and go straight to global if it isn't defined in the local namespace already. I decided not to even bring it up because it felt controversial (it would effectively be global first, except for user functions). Though, it might be a nice compromise?
— Rob
A third option, which I haven't seen come up on the list yet, is
that
unqualified functions that are PHP built-ins are treated as global,
and
using a function having the same name as a built-in, in a namespace
scope, requires a fully qualified name to override the built-in.It seems that if someone is writing
array_key_exists()
or similar
they probably mean the built-in function, and in the rare cases
where
they do mean\foo\array_key_exists()
, they can write it
explicitly.Functions that are not on the built-in function list could
default to
the local namespace.I was actually thinking of doing something like this for function
autoloading, where extensions could register global functions that
bypass the autoloader and go straight to global if it isn't defined
in the local namespace already. I decided not to even bring it up
because it felt controversial (it would effectively be global first,
except for user functions). Though, it might be a nice compromise?— Rob
I think we are all trying to achieve the same thing here.
There's different ways to go about it, each with pros and cons, but I
think we need to think through how all the parts fit together, because
I think if we can reach consensus, then several RFCs become compatible
with each other.
A change to how function namespace lookup works would affect function
auto-loading, and it may even make it simpler, depending on the other
options that we end up going with.
Am 23.08.2024 um 11:34 schrieb Nick Lockheart lists@ageofdream.com:
I think we are all trying to achieve the same thing here.
I'm not sure who "we" and what "same thing" here exactly is.
I recall the following arguments for changing the current situation about function look ups:
- Performance
- Function autoloading
- Consistency
Did I miss something big?
First of all I don't think the performance argument holds enough weight as I'm very doubtful this impacts performance of a real world application in a significant way. And for people really hitting this problem there is a solution already.
Secondly I am a bit confused about the whole function autoloading discussion: There is already a good-enough mechanism (putting them as static functions inside a tool class). I just don't consider the hoops we have to jump through to get a more "pure" or fine-grained solution for a special problem not worth it. As for the "don't use classes for static functions" I've yet to see a good argument apart from personal preference.
As far as consistency goes I've yet to encounter someone being confused about function resolution. But then again I'm not reaching namespaces for PHP classes.
While modern tooling possibly can adapt source code to the new style efficiently I have to maintain too many installations of PHP projects on various hosters to looking forward to that. And the argument that "you can just stay on an old PHP version" is just not a feasible solution either..
Maybe we should take a step back and reevaluate the pros and cons.
- Chris
Am 23.08.2024 um 11:34 schrieb Nick Lockheart lists@ageofdream.com:
I think we are all trying to achieve the same thing here.
I'm not sure who "we" and what "same thing" here exactly is.
I recall the following arguments for changing the current situation about function look ups:
- Performance
- Function autoloading
- Consistency
Did I miss something big?
Nick was replying to me :p, judging by the quoted paragraph.
First of all I don't think the performance argument holds enough weight as I'm very doubtful this impacts performance of a real world application in a significant way. And for people really hitting this problem there is a solution already.
Secondly I am a bit confused about the whole function autoloading discussion: There is already a good-enough mechanism (putting them as static functions inside a tool class). I just don't consider the hoops we have to jump through to get a more "pure" or fine-grained solution for a special problem not worth it. As for the "don't use classes for static functions" I've yet to see a good argument apart from personal preference.
As far as consistency goes I've yet to encounter someone being confused about function resolution. But then again I'm not reaching namespaces for PHP classes.
As far as function overloading goes, I recommend checking out a draft RFC I've been working on a very, very long time: https://wiki.php.net/rfc/records. In some off-list discussions, it was clear that if I wanted this syntax, I would need to pursue function autoloading. Further, function autoloading is a clearly missing feature that would be useful in many situations. If function autoloading doesn't work out, I will need to take a different approach to that syntax (which is fine, but not something I want because I chose the syntax for a very good reason). That being said, I'm not ready to discuss records here, so this is the first and last time I'll mention it on the thread. There is a Reddit post in r/php and a GitHub repo if you are interested in discussing records. There are very many things to work out still, and it is very much work-in-progress.
While modern tooling possibly can adapt source code to the new style efficiently I have to maintain too many installations of PHP projects on various hosters to looking forward to that. And the argument that "you can just stay on an old PHP version" is just not a feasible solution either..
Maybe we should take a step back and reevaluate the pros and cons.
- Chris
— Rob
Am 23.08.2024 um 12:27 schrieb Rob Landers rob@bottled.codes:
Am 23.08.2024 um 11:34 schrieb Nick Lockheart lists@ageofdream.com:
I think we are all trying to achieve the same thing here.
I'm not sure who "we" and what "same thing" here exactly is.
Nick was replying to me :p, judging by the quoted paragraph.
The "all" in his sentence suggested to me that he means more than him and you.
But then again I might have misinterpreted this.
As far as function overloading goes, I recommend checking out a draft RFC I've been working on a very, very long time: https://wiki.php.net/rfc/records. In some off-list discussions, it was clear that if I wanted this syntax, I would need to pursue function autoloading.
Definitely an interesting read, thanks a lot for the work you put into it!
Further, function autoloading is a clearly missing feature that would be useful in many situations.
The "clearly missing" and "many" part is where I disagree. But I was mainly considering current PHP, not future PHP syntax like the Records stuff, agreed.
If function autoloading doesn't work out, I will need to take a different approach to that syntax (which is fine, but not something I want because I chose the syntax for a very good reason).
I know you do not want to discuss this here as it is off-topic but it kind of feels the only advantage is to get rid of "new" in the usage of Records. But I'll leave it at that as to per your request, we can revisit that once the RFC hits the discussion stage.
That being said, I'm not ready to discuss records here, so this is the first and last time I'll mention it on the thread. There is a Reddit post in r/php and a GitHub repo if you are interested in discussing records. There are very many things to work out still, and it is very much work-in-progress.
Also a bit off-topic but I still have to mention it, maybe worth another thread:
I understand where you are coming from but at the same time it feels a bit worrying to me to use another medium (reddit) for a discussion about future language features when we have this mailing list.
I hope this won't mean that questions/suggestions/concerns on this mailing list won't be discredited because of discussions which happened elsewhere. I'm sorry if I sound a bit paranoid here but I've been in this situation before in other (not software related) aspects of my life before where I was told that something was already decided and people were not willing to go back on certain issues because of that.
Regards,
- Chris
Am 23.08.2024 um 12:27 schrieb Rob Landers rob@bottled.codes:
Am 23.08.2024 um 11:34 schrieb Nick Lockheart lists@ageofdream.com:
I think we are all trying to achieve the same thing here.
I'm not sure who "we" and what "same thing" here exactly is.
Nick was replying to me :p, judging by the quoted paragraph.
The "all" in his sentence suggested to me that he means more than him and you.
But then again I might have misinterpreted this.As far as function overloading goes, I recommend checking out a draft RFC I've been working on a very, very long time: https://wiki.php.net/rfc/records. In some off-list discussions, it was clear that if I wanted this syntax, I would need to pursue function autoloading.
Definitely an interesting read, thanks a lot for the work you put into it!
Further, function autoloading is a clearly missing feature that would be useful in many situations.
The "clearly missing" and "many" part is where I disagree. But I was mainly considering current PHP, not future PHP syntax like the Records stuff, agreed.
If function autoloading doesn't work out, I will need to take a different approach to that syntax (which is fine, but not something I want because I chose the syntax for a very good reason).
I know you do not want to discuss this here as it is off-topic but it kind of feels the only advantage is to get rid of "new" in the usage of Records. But I'll leave it at that as to per your request, we can revisit that once the RFC hits the discussion stage.
That being said, I'm not ready to discuss records here, so this is the first and last time I'll mention it on the thread. There is a Reddit post in r/php and a GitHub repo if you are interested in discussing records. There are very many things to work out still, and it is very much work-in-progress.
Also a bit off-topic but I still have to mention it, maybe worth another thread:
I understand where you are coming from but at the same time it feels a bit worrying to me to use another medium (reddit) for a discussion about future language features when we have this mailing list.
Don't be worried about it too much. Many RFCs start somewhere else first before they end up here. First as an idea, then a draft, then they ask friends/coworkers to read them over, etc. By the time it ends up on the list, a lot of work has been done (in some cases). Sometimes, they are simple-ish RFCs that need little work and are pretty straightforward, but for more complex ones, there are usually several cycles before it will end up on the mailing list. Further, it has come to my attention that an implementation is basically an unwritten requirement, so spending time on that is also a delay there. At least, that has been my experience so far with that one.
I hope this won't mean that questions/suggestions/concerns on this mailing list won't be discredited because of discussions which happened elsewhere. I'm sorry if I sound a bit paranoid here but I've been in this situation before in other (not software related) aspects of my life before where I was told that something was already decided and people were not willing to go back on certain issues because of that.
The way I look at it, nothing is set in stone until everyone has seen it and had a chance to respond. Yes, there are good reasons that things are the way they are in that RFC, and during discussion, I expect those reasons will come up. I have no idea if those reasons stand up under scrutiny, and I won't find out until then. These are all known unknowns.
There are some people on the list who believe once it is on the list, it is unchangeable unless you are a voter and act accordingly, such as ignoring non-voter concerns. I, personally, feel that shouldn't be how things work. It is "our language" (voter or not) and not "Rob's language." I guess we will see how that plays out in the coming months.
Regards,
- Chris
— Rob
Am 23.08.2024 um 12:27 schrieb Rob Landers rob@bottled.codes:
Am 23.08.2024 um 11:34 schrieb Nick Lockheart lists@ageofdream.com:
I think we are all trying to achieve the same thing here.
I'm not sure who "we" and what "same thing" here exactly is.
Nick was replying to me :p, judging by the quoted paragraph.
The "all" in his sentence suggested to me that he means more than him and you.
But then again I might have misinterpreted this.As far as function overloading goes, I recommend checking out a draft RFC I've been working on a very, very long time: https://wiki.php.net/rfc/records. In some off-list discussions, it was clear that if I wanted this syntax, I would need to pursue function autoloading.
Definitely an interesting read, thanks a lot for the work you put into it!
Further, function autoloading is a clearly missing feature that would be useful in many situations.
The "clearly missing" and "many" part is where I disagree. But I was mainly considering current PHP, not future PHP syntax like the Records stuff, agreed.
If function autoloading doesn't work out, I will need to take a different approach to that syntax (which is fine, but not something I want because I chose the syntax for a very good reason).
I know you do not want to discuss this here as it is off-topic but it kind of feels the only advantage is to get rid of "new" in the usage of Records. But I'll leave it at that as to per your request, we can revisit that once the RFC hits the discussion stage.
That being said, I'm not ready to discuss records here, so this is the first and last time I'll mention it on the thread. There is a Reddit post in r/php and a GitHub repo if you are interested in discussing records. There are very many things to work out still, and it is very much work-in-progress.
Also a bit off-topic but I still have to mention it, maybe worth another thread:
I understand where you are coming from but at the same time it feels a bit worrying to me to use another medium (reddit) for a discussion about future language features when we have this mailing list.I hope this won't mean that questions/suggestions/concerns on this mailing list won't be discredited because of discussions which happened elsewhere. I'm sorry if I sound a bit paranoid here but I've been in this situation before in other (not software related) aspects of my life before where I was told that something was already decided and people were not willing to go back on certain issues because of that.
FWIW, Rob asked me to review his ideas in email and I pushed him to open up a repo so that discussions could be captured and made public rather than lost to the ethers of private email.
The problem with discussing everything on the list from minute one is that the list has a habit of punishing those who bring ideas that are not fully-baked. It is very hard to brainstorm anything on the list without getting shot down by those who expect everything discussed on the list to already be fully fleshed out.
Better to get a group of motivated individuals who all want to see a related RFC succeed and get them to work through the issues enough that when (and if) it is brought to the list there will not be obvious negative arguments against it to nip it in the bud when it otherwise could be something worthwhile.
Also, whereas only discussion happens on the list, in a repo you can actually start writing a potential implementation and crafting an eventual RFC, so it has the potential to be more productive. Finally, the list is the gatekeeper anyway, so if things are already decided that the list disagrees with, it still won't pass.
So, if you feel you are a stakeholder on this idea, I'd suggest celebrating the idea of a repo and joining the discussion there.
-Mike
P.S. As for using Reddit, I'm not as big of that as its has its own culture that can be just a difficult to incubate ideas as here on this list.
A third option, which I haven't seen come up on the list yet, is that
unqualified functions that are PHP built-ins are treated as global, and
using a function having the same name as a built-in, in a namespace
scope, requires a fully qualified name to override the built-in.It seems that if someone is writing
array_key_exists()
or similar
they probably mean the built-in function, and in the rare cases where
they do mean\foo\array_key_exists()
, they can write it explicitly.Functions that are not on the built-in function list could default to
the local namespace.
I was going back and forth on this.
On one hand it could be confusing for developers to learn when to use \
and when not to.
OTOH, once they learn it would create a clear indication of which functions are userland code and which are standard library, and I this could have significant benefit for readability and maintainability.
Yet OTOH, this would create a bifurcation between userland and standard library that is encouraged in some languages and discouraged in others, i.e. the latter being the "keep the language as small as possible and then let the standard library be implemented in the language no differently than any code a user could write" type of languages.
Yet OTOH still, PHP does not allow core functions to be implemented in userland, at least without monkey patching so the bifurcation already exists.
Go, for example, has both. Most of the standard library is just Go code anyone could replace with their own, but a handful of functions are special and built into the language, e.g. append, new, close, etc. Basically anything in lowercase that can be used globally without qualification is special. And after using Go, I think it is a great design as it reserves future enhancements without BC concerns.
In theory it would be nice to open up PHP to allow overriding core functions, but that could also open a Pandora's box, the kind that makes Ruby code so fragile. At least in Go you have to omit the standard lib import and use your own import to override a standard library package.
So in practice PHP may never change to allow core functions to be overridden and thus pining for that to block this idea would be a missed opportunity. (That said, PHP could allow a userland function to be "registered" to be called instead of the core function, and if that were allowed then Nick's proposal would cause no problems. Of course I doubt internals would ever bless that idea.)
Anyway — in summary — I think Nick's 3rd option has wings. And along with automatic use
statements for each namespace might just be the best possible solution.
-Mike
P.S. If PHP ever added a set of standard library functions written in PHP to the core distribution, they should rightly IMO need to be namespaced, per this proposal. But here I digress. I only mention in hopes to keep this specific dream alive for some future day.
In theory it would be nice to open up PHP to allow overriding core functions, but that could also open a Pandora's box, the kind that makes Ruby code so fragile. At least in Go you have to omit the standard lib import and use your own import to override a standard library package.
So in practice PHP may never change to allow core functions to be overridden and thus pining for that to block this idea would be a missed opportunity. (That said, PHP could allow a userland function to be "registered" to be called instead of the core function, and if that were allowed then Nick's proposal would cause no problems. Of course I doubt internals would ever bless that idea.)
There are already PECL extensions which allow to do this, e.g.
And I'd leave it at this.
Christoph
A third option, which I haven't seen come up on the list yet, is that
unqualified functions that are PHP built-ins are treated as global, and
using a function having the same name as a built-in, in a namespace
scope, requires a fully qualified name to override the built-in.It seems that if someone is writing
array_key_exists()
or similar
they probably mean the built-in function, and in the rare cases where
they do mean\foo\array_key_exists()
, they can write it explicitly.Functions that are not on the built-in function list could default to
the local namespace.
This doesn't solve the "future hidden BC break" aspect, at all.
If I write a namespaced function http_parse_url
to adhere more strictly to whatever relevant RFC, when I write it, it will work as expected using a relative name (and/or itself quite likely using helper functions in the same namespace)
If an RFC then approves a "core" http_parse_url
function to serve as a better replacement for the old parse_url
, suddenly my function won't work the same way with the new version of PHP... because of a global function that didn't exist when I wrote mine.
If this "global first" change is made, any use of unqualified functions that refer to the current namespace will have to use some form of current-namespace indicator to be future version-safe.
If you want to make global functions resolve without a \
the only realistically safe solution is to completely remove unqualified lookup of local function (and constant) symbols, and force them to always use a name that resolves absolutely, because otherwise any future version could break what they do completely.
Could be mistaken, but I think the way PHP handles namespaces
internally is sort of the same as a long string, rather than as a
tree/hierarchy.
Just to be clear, PHP already has a syntax for explicitly resolving a name relative to the current namespace, it's just not needed very often. See e.g. https://3v4l.org/Xfma5 and https://3v4l.org/3o2TD (You're right that underneath it's all just string concatenation, but that's all you need in this case.)
All I was talking about was alternative syntax that would behave in exactly the same way that "namespace\Foo" already does.
--
Rowan Tommins
[IMSoP]
Hi Rowan,
Having to prefix with a name like Foo, e.g. Foo\strlen() is FAR PREFERABLE to _\strlen() because at least it provides satiating information rather than the empty calories of a cryptic shorthand. #jmtcw, anyway.
I knew I'd regret keeping the example short. Realistically, it's not a substitute for "\Foo\strlen", it's a substitute for "\AcmeComponents\SplineReticulator\Utilities\Text\strlen".
And similarly, I too regret keeping my answer short. I was assuming what I omitted would be obvious.
(And I am not being snarky, I literally thought about including this next but then felt I did not need to. Hindsight!)
So, long namespaces is why PHP has the use
statement, making references in functions short and sweet, e.g:
namespace \AcmeComponents\SplineReticulator\Utilities\Text
use \AcmeComponents\SplineReticulator\Utilities\Text
function Foo():int {
return Text\strlen("Hello World");
}
(Of course, that is a lot of redundant boilerplate.)
Another option would be to find a shorter keyword than "namespace" to put it in front. "ns\strlen(...)" is an obvious step from what we have currently, but it's not very obvious what it means, so maybe there's a different word we could use.
So rather than all that boilerplate, and rather than yet another special set of characters developers would need to learn and remember — and tooling would need to adjust to — we could instead easily add an automatic use
statement for every namespace, as long as no existing use statement conflicts with it.
An automatic use
would then give us the following, which provides a strong information sent and is really consistent with the nature of the PHP language:
namespace \AcmeComponents\SplineReticulator\Utilities\Text
function Foo():int {
return Text\strlen("Hello World");
}
The above of course could result in BC breaks IF there happened to be existing code that referenced Text\strlen() where Text was a top-level namespace, AND that code was not remediated when this change takes place. However, I am guessing those collisions would be pretty rare as both the namespace and the symbol would have to match to be in conflict.
Having a syntax for "relative to current" is incredibly common in other path-like syntaxes. The most common marker is ".", and ".\foo" is literally how you'd refer to something in the current directory under DOS/Windows. But unfortunately, we don't have "." available, so I wondered if "_" would feel similar enough.
I'll be honest, the association with the relative path of .\
did not occur to me when you presented _\
so after you stating this I pondered if from that perspective.
However, frankly, I am not sold on that perspective. Something about it does not feel right. I can't currently give any more objective arguments than that, so I will just leave it as #jmctw.
I will say if we were going with relative path, I think \\strlen()
would be preferable to _\strlen()
. Subjectively \\
is easier for me to "see" and thus does not look so out of place to me.
OTOH the objective arguments for \\
over _\
are it is much easier to type: slash+slash vs. shift-underscore+nonshift-slash. There is a precedent in URIs with //
, albeit not exactly equivalent. And finally, it also does not use a sigil that could be better used elsewhere in some as-yet-to be agreed or envisioned future use. #fwiw
-Mike
namespace \AcmeComponents\SplineReticulator\Utilities\Text
function Foo():int {
return Text\strlen("Hello World");
}The above of course could result in BC breaks IF there happened to be
existing code that referenced Text\strlen() where Text was a top-level
namespace
It wouldn't be a top-level namespace that would cause a conflict, but a nested one: currently the above code resolves the function name as "AcmeComponents\SplineReticulator\Utilities\Text\Text\strlen" (note the "...\Text\Text...").
It's an interesting suggestion, but I'm not totally sold on "use the end of the current namespace" being easier to remember than "use this symbol or keyword".
return namespace\strlen("Hello World"); # current syntax, rather long and unclear
return _\strlen("Hello World"); # short, but maybe a bit cryptic
return Text\strlen("Hello World"); # variable length, relies on current context
return NS\strlen("Hello World"); # shortening of current keyword
return self\strlen("Hello World"); # maybe confusing to reuse a keyword?
return current\strlen("Hello World"); # clear, but a bit long
--
Rowan Tommins
[IMSoP]
namespace \AcmeComponents\SplineReticulator\Utilities\Text
function Foo():int {
return Text\strlen("Hello World");
}The above of course could result in BC breaks IF there happened to be
existing code that referenced Text\strlen() where Text was a top-level
namespaceIt wouldn't be a top-level namespace that would cause a conflict, but a nested one: currently the above code resolves the function name as "AcmeComponents\SplineReticulator\Utilities\Text\Text\strlen" (note the "...\Text\Text...").
How often does that really occur in the wild?
And how can it occur without an explicit use AcmeComponents\SplineReticulator\Utilities\Text\Text
statement, which I proposed would override the automatic use
, anyway?
It's an interesting suggestion, but I'm not totally sold on "use the end of the current namespace" being easier to remember than "use this symbol or keyword".
return namespace\strlen("Hello World"); # current syntax, rather long and unclear
return _\strlen("Hello World"); # short, but maybe a bit cryptic
return Text\strlen("Hello World"); # variable length, relies on current context
return NS\strlen("Hello World"); # shortening of current keyword
return self\strlen("Hello World"); # maybe confusing to reuse a keyword?
return current\strlen("Hello World"); # clear, but a bit long
The only one of those that has a strong analog to existing PHP code is to "use the end of the current namespace" as people frequently do with explicit use
statements.
Yes, self
has a weak analog to self::
, but none of the others even come close, IMO. And adding self\
may have unintended to the language, some slight BC concerns, and/or downstream consequences for other projects vs. a simple automatic use
.
Lastly, no comment on \\
?
-Mike
It wouldn't be a top-level namespace that would cause a conflict, but a nested one: currently the above code resolves the function name as "AcmeComponents\SplineReticulator\Utilities\Text\Text\strlen" (note the "...\Text\Text...").
How often does that really occur in the wild?
Oh, I think it would be much rarer than colliding with a global namespace. I was pointing out that your suggestion was better than you thought.
And how can it occur without an explicit
use AcmeComponents\SplineReticulator\Utilities\Text\Text
statement, which I proposed would override the automaticuse
, anyway?
I'm not sure what you mean. Right now, that's the function name that would be looked up for your example code (other than a couple of unrelated typos in your example). So if, for some reason, someone was relying on that, their code would break with your "automatic use".
The only one of those that has a strong analog to existing PHP code is to "use the end of the current namespace" as people frequently do with explicit
use
statements.
True. I just don't love the context-sensitive nature of it.
Lastly, no comment on
\\
?
Ah, yes, I forgot to say: I'm not keen on that because in other contexts it means exactly the opposite: it refers to the absolute root in a context where a relative name would be assumed.
For example, \domain\username and \server\fileshare on Windows, or //example.com/foo in a URL
Rowan Tommins
[IMSoP]
And how can it occur without an explicit
use AcmeComponents\SplineReticulator\Utilities\Text\Text
statement, which I proposed would override the automaticuse
, anyway?I'm not sure what you mean. Right now, that's the function name that would be looked up for your example code (other than a couple of unrelated typos in your example). So if, for some reason, someone was relying on that, their code would break with your "automatic use".
I must be missing something. Can you give a specific example showing how the automatic use would conflict with something other than a root namespace?
The only one of those that has a strong analog to existing PHP code is to "use the end of the current namespace" as people frequently do with explicit
use
statements.True. I just don't love the context-sensitive nature of it.
Well, we frequently see things differently, which is what it is.
Myself, I like the explicit nature of it that allows seeing at a glance that it is Text without having to look up through a long PHP file to find the namespace statement. But as my dad says, to each his own, I guess.
Besides, I think there may be some better solutions discussed on this thread besides our dueling qualification syntax.
Lastly, no comment on
\\
?Ah, yes, I forgot to say: I'm not keen on that because in other contexts it means exactly the opposite: it refers to the absolute root in a context where a relative name would be assumed.
For example, \domain\username and \server\fileshare on Windows, or //example.com/foo in a URL
Fair point. But then treating built-ins differently than userland globals may be the way to go that has the least BC breakage over either of our two proposals here.
That said, having an automatic use
would be nice for those who want to use it, even if it is not required.
-Mike
And how can it occur without an explicit
use AcmeComponents\SplineReticulator\Utilities\Text\Text
statement, which I proposed would override the automaticuse
, anyway?I'm not sure what you mean. Right now, that's the function name that would be looked up for your example code (other than a couple of unrelated typos in your example). So if, for some reason, someone was relying on that, their code would break with your "automatic use".
I must be missing something. Can you give a specific example showing
how the automatic use would conflict with something other than a root
namespace?
You already gave the example yourself, you just misunderstood what its current behaviour is. Here it is, with the typos fixed: https://3v4l.org/6eD3N
As you can see from the error message, it doesn't look up "Text\strlen()", it looks up "AcmeComponents\SplineReticulator\Utilities\Text\Text\strlen()"
Or to be even clearer: https://3v4l.org/ojVcP
And how can it occur without an explicit
use AcmeComponents\SplineReticulator\Utilities\Text\Text
statement, which I proposed would override the automaticuse
, anyway?I'm not sure what you mean. Right now, that's the function name that would be looked up for your example code (other than a couple of unrelated typos in your example). So if, for some reason, someone was relying on that, their code would break with your "automatic use".
I must be missing something. Can you give a specific example showing
how the automatic use would conflict with something other than a root
namespace?You already gave the example yourself, you just misunderstood what its current behaviour is. Here it is, with the typos fixed: https://3v4l.org/6eD3N
As you can see from the error message, it doesn't look up "Text\strlen()", it looks up "AcmeComponents\SplineReticulator\Utilities\Text\Text\strlen()"
Or to be even clearer: https://3v4l.org/ojVcP
Thank you for clarifying.
You got me there.
And it is a good illustration why I continue to really dislike how namespaces work. Too many footguns that are too easy to misunderstand.
But it is what it is, I guess.
-Mike
BUT, if people already complain about "" being ugly, having to write
"namespace" is going to make them REALLY grumpy...
So maybe at the same time (or, probably, in advance) we need to come
up with a nicer syntax for explicitly referencing the current
namespace.
Unfortunately, finding unused syntax is hard, which is why we have
"" in the first place (and for the record, I think it works just
fine), but maybe something like "_" could work? Giving us:
namespace Foo;
$native_length = strlen('hello'); # same as \strlen('hello')
$foo_length = _\strlen('hello'); # same as \Foo\strlen('hello')
namespace foo using global functions;
-
or -
namespace foo using local functions;
Tell PHP what you want at the per-file level.
BUT, if people already complain about "" being ugly, having to write
"namespace" is going to make them REALLY grumpy...
So maybe at the same time (or, probably, in advance) we need to come
up with a nicer syntax for explicitly referencing the current
namespace.namespace foo using global functions;
or -
namespace foo using local functions;
Tell PHP what you want at the per-file level.
This doesn't seem mutually exclusive to me. If you have a file where you've opted for "using global functions", you might want a way to reference a function in the current namespace.
It also doesn't address my other point, that having global as the default mode (even if we provide an option for local) is much less disruptive to existing code.
Regards,
Rowan Tommins
[IMSoP]
On 23 August 2024 01:42:38 BST, Nick Lockheart lists@ageofdream.com
wrote:BUT, if people already complain about "" being ugly, having to
write
"namespace" is going to make them REALLY grumpy...
So maybe at the same time (or, probably, in advance) we need to
come
up with a nicer syntax for explicitly referencing the current
namespace.namespace foo using global functions;
- or -
namespace foo using local functions;
Tell PHP what you want at the per-file level.
This doesn't seem mutually exclusive to me. If you have a file where
you've opted for "using global functions", you might want a way to
reference a function in the current namespace.
Correct, so if you use the example:
namespace foo using global functions;
you can write:
`array_key_exists()`;
and it will be resolved as global without a namespace lookup and will
use the dedicated opcode.
But if you need to use a local function you can do:
\foo\sort();
The proposed global/local declaration as part of the namespace
declaration just turns off namespace lookups and sets the default
resolution for unqualified names.
Fully qualified names are not affected.
It also doesn't address my other point, that having global as the
default mode (even if we provide an option for local) is much less
disruptive to existing code.
They are compatible, but related decisions.
I think it would be easier for people to accept a new PHP version where
unqualified names were always global, if we also had an option to make
local/namespaced the default resolution for unqualified names, on a
per-file basis, for those who need that.
Thus, there are multiple decision points:
-
Should we do namespace lookups on unqualified function calls at all?
-
If yes to 1, should we lookup in global first or local first?
-
Regardless of 1 or 2, should we let developers explicitly specify a
behavior for unqualified calls in the namespace declaration? -
If yes to 1, should the behavior of namespace lookups change for
user-defined functions vs PHP built-in function names?
These aren't mutually exclusive, but they all work together to create a
complete behavior.
There are several ways that the above options could be combined:
OPTION ONE
Using a regular namespace declaration still does an NS lookup, in the
same order, just like it normally works now.
That means that code that uses:
namespace foo;
will behave exactly the same as today, with no BC breaks.
Developers using the new PHP version could opt-in to explicit namespace
behavior with:
namespace foo using global functions;
or
namespace foo using local functions;
In both cases, fully-qualified names still work the same.
Only unqualified names are affected by this directive, and they use
local only or global only, depending on the declaration.
OPTION TWO
Namespace lookup is removed from a future version of PHP.
Code that uses the current namespace declaration:
namespace foo;
will assume that all unqualified function calls are global scope.
To use a function in the local namespace, it can be fully qualified
with:
\foo\MyFunction();
But, developers could also write:
namespace foo using local functions;
And all unqualified function names would be resolved to local at
compile time. Global functions could still be accessed with a \
if
this directive was used:
\array_key_exists();
OPTION THREE
Namespace lookup is removed from a future version of PHP.
Code that uses the current namespace declaration:
namespace foo;
...will assume that an unqualified function name is a global function
IF it is a PHP built-in function.
Otherwise, unqualified function names that are not PHP built-in
functions will be presumed to be local to the namespace.
With Option Three, developers can still fully-qualify their functions:
\foo\array_key_exists();
...to override a built-in name with a user function in the current
namespace.
Likewise, a fully-qualified:
\MyFunction();
called from inside a namespace will still call the global function.
Only unqualified names are affected.
As an additional optional feature of Option Three, developers can
change this behavior with:
namespace foo using global functions;
or
namespace foo using local functions;
Only unqualified names are affected by this directive, and they use
local only or global only, depending on the namespace declaration.
In both cases, fully-qualified names still work the same.
Of course, there are many other possibilities that can be mixed-and-
matched.
On 23 August 2024 01:42:38 BST, Nick Lockheart lists@ageofdream.com
wrote:BUT, if people already complain about "" being ugly, having to
write
"namespace" is going to make them REALLY grumpy...
So maybe at the same time (or, probably, in advance) we need to
come
up with a nicer syntax for explicitly referencing the current
namespace.namespace foo using global functions;
or -
namespace foo using local functions;
Tell PHP what you want at the per-file level.
This doesn't seem mutually exclusive to me. If you have a file where
you've opted for "using global functions", you might want a way to
reference a function in the current namespace.Correct, so if you use the example:
namespace foo using global functions;
you can write:
`array_key_exists()`;
and it will be resolved as global without a namespace lookup and will
use the dedicated opcode.But if you need to use a local function you can do:
\foo\sort();
The proposed global/local declaration as part of the namespace
declaration just turns off namespace lookups and sets the default
resolution for unqualified names.Fully qualified names are not affected.
It also doesn't address my other point, that having global as the
default mode (even if we provide an option for local) is much less
disruptive to existing code.They are compatible, but related decisions.
I think it would be easier for people to accept a new PHP version where
unqualified names were always global, if we also had an option to make
local/namespaced the default resolution for unqualified names, on a
per-file basis, for those who need that.Thus, there are multiple decision points:
Should we do namespace lookups on unqualified function calls at all?
If yes to 1, should we lookup in global first or local first?
Regardless of 1 or 2, should we let developers explicitly specify a
behavior for unqualified calls in the namespace declaration?If yes to 1, should the behavior of namespace lookups change for
user-defined functions vs PHP built-in function names?These aren't mutually exclusive, but they all work together to create a
complete behavior.There are several ways that the above options could be combined:
OPTION ONE
Using a regular namespace declaration still does an NS lookup, in the
same order, just like it normally works now.That means that code that uses:
namespace foo;
will behave exactly the same as today, with no BC breaks.
Developers using the new PHP version could opt-in to explicit namespace
behavior with:namespace foo using global functions;
or
namespace foo using local functions;
In both cases, fully-qualified names still work the same.
Only unqualified names are affected by this directive, and they use
local only or global only, depending on the declaration.OPTION TWO
Namespace lookup is removed from a future version of PHP.
Code that uses the current namespace declaration:
namespace foo;
will assume that all unqualified function calls are global scope.
To use a function in the local namespace, it can be fully qualified
with:\foo\MyFunction();
But, developers could also write:
namespace foo using local functions;
And all unqualified function names would be resolved to local at
compile time. Global functions could still be accessed with a\
if
this directive was used:\array_key_exists();
OPTION THREE
Namespace lookup is removed from a future version of PHP.
Code that uses the current namespace declaration:
namespace foo;
...will assume that an unqualified function name is a global function
IF it is a PHP built-in function.Otherwise, unqualified function names that are not PHP built-in
functions will be presumed to be local to the namespace.With Option Three, developers can still fully-qualify their functions:
\foo\array_key_exists();
...to override a built-in name with a user function in the current
namespace.Likewise, a fully-qualified:
\MyFunction();
called from inside a namespace will still call the global function.
Only unqualified names are affected.
As an additional optional feature of Option Three, developers can
change this behavior with:namespace foo using global functions;
or
namespace foo using local functions;
Only unqualified names are affected by this directive, and they use
local only or global only, depending on the namespace declaration.In both cases, fully-qualified names still work the same.
Of course, there are many other possibilities that can be mixed-and-
matched.
I personally would find option 3 to be the best of both worlds, and you don't even need the namespace ... using ... functions
stuff.
— Rob
On 23 August 2024 01:42:38 BST, Nick Lockheart lists@ageofdream.com
wrote:BUT, if people already complain about "" being ugly, having to
write
"namespace" is going to make them REALLY grumpy...
So maybe at the same time (or, probably, in advance) we need to
come
up with a nicer syntax for explicitly referencing the current
namespace.namespace foo using global functions;
or -
namespace foo using local functions;
Tell PHP what you want at the per-file level.
This doesn't seem mutually exclusive to me. If you have a file where
you've opted for "using global functions", you might want a way to
reference a function in the current namespace.Correct, so if you use the example:
namespace foo using global functions;
you can write:
`array_key_exists()`;
and it will be resolved as global without a namespace lookup and will
use the dedicated opcode.But if you need to use a local function you can do:
\foo\sort();
The proposed global/local declaration as part of the namespace
declaration just turns off namespace lookups and sets the default
resolution for unqualified names.Fully qualified names are not affected.
It also doesn't address my other point, that having global as the
default mode (even if we provide an option for local) is much less
disruptive to existing code.They are compatible, but related decisions.
I think it would be easier for people to accept a new PHP version where
unqualified names were always global, if we also had an option to make
local/namespaced the default resolution for unqualified names, on a
per-file basis, for those who need that.Thus, there are multiple decision points:
Should we do namespace lookups on unqualified function calls at all?
If yes to 1, should we lookup in global first or local first?
Regardless of 1 or 2, should we let developers explicitly specify a
behavior for unqualified calls in the namespace declaration?If yes to 1, should the behavior of namespace lookups change for
user-defined functions vs PHP built-in function names?These aren't mutually exclusive, but they all work together to create a
complete behavior.There are several ways that the above options could be combined:
OPTION ONE
Using a regular namespace declaration still does an NS lookup, in the
same order, just like it normally works now.That means that code that uses:
namespace foo;
will behave exactly the same as today, with no BC breaks.
Developers using the new PHP version could opt-in to explicit namespace
behavior with:namespace foo using global functions;
or
namespace foo using local functions;
In both cases, fully-qualified names still work the same.
Only unqualified names are affected by this directive, and they use
local only or global only, depending on the declaration.OPTION TWO
Namespace lookup is removed from a future version of PHP.
Code that uses the current namespace declaration:
namespace foo;
will assume that all unqualified function calls are global scope.
To use a function in the local namespace, it can be fully qualified
with:\foo\MyFunction();
But, developers could also write:
namespace foo using local functions;
And all unqualified function names would be resolved to local at
compile time. Global functions could still be accessed with a\
if
this directive was used:\array_key_exists();
OPTION THREE
Namespace lookup is removed from a future version of PHP.
Code that uses the current namespace declaration:
namespace foo;
...will assume that an unqualified function name is a global function
IF it is a PHP built-in function.Otherwise, unqualified function names that are not PHP built-in
functions will be presumed to be local to the namespace.With Option Three, developers can still fully-qualify their functions:
\foo\array_key_exists();
...to override a built-in name with a user function in the current
namespace.Likewise, a fully-qualified:
\MyFunction();
called from inside a namespace will still call the global function.
Only unqualified names are affected.
As an additional optional feature of Option Three, developers can
change this behavior with:namespace foo using global functions;
or
namespace foo using local functions;
Only unqualified names are affected by this directive, and they use
local only or global only, depending on the namespace declaration.In both cases, fully-qualified names still work the same.
Of course, there are many other possibilities that can be mixed-and-
matched.I personally would find option 3 to be the best of both worlds, and you don't even need the
namespace ... using ... functions
stuff.— Rob
Totally sent that before finishing...
My only two concerns are:
- Calling functions in the current namespace. I don't want that syntax to change.
- Changing the order might make function autoloading impossible; forever.
If these concerns can be ameliorated, then I don't really care much about the specifics.
— Rob
having global as the default mode (even if we provide an option for local) is much less disruptive to existing code.
Hi Rowan,
I don't disagree with this summary of the current state, but I think this misses an important factor: namespaced functions are currently nowhere near as popular as namespaced classes, and a significant part of that is almost certainly because we don't have function autoloading, nor any kind of visibility controls for functions (eg package private).
Making relative function names do the opposite of relative class names sounds like a great way to permanently kill any prospects of encouraging developers to use regular namespaced functions in place of static classes as "bag of functions", which is what we keep hearing we should use - most notably on a recent RFC to embody the concept of a static class.
So we're told "no don't use classes for static functions like that, use proper functions".
We already can't autoload them which makes them less appealing, and less practical.
In a world where global functions take precedence over local ones because some people don't like writing a single \ character, autoloading would be a moot point because if you preference global functions you're implicitly telling developers they shouldn't write namespaced functions, by making them harder and less intuitive to use.
Cheers
Stephen
On 23 Aug 2024, at 15:29, Rowan Tommins [IMSoP] imsop.php@rwec.co.uk
wrote:having global as the default mode (even if we provide an option for
local) is much less disruptive to existing code.Hi Rowan,
I don't disagree with this summary of the current state, but I think this
misses an important factor: namespaced functions are currently nowhere near
as popular as namespaced classes, and a significant part of that is almost
certainly because we don't have function autoloading, nor any kind of
visibility controls for functions (eg package private).Making relative function names do the opposite of relative class names
sounds like a great way to permanently kill any prospects of encouraging
developers to use regular namespaced functions in place of static classes
as "bag of functions", which is what we keep hearing we should use - most
notably on a recent RFC to embody the concept of a static class.So we're told "no don't use classes for static functions like that, use
proper functions".We already can't autoload them which makes them less appealing, and less
practical.In a world where global functions take precedence over local ones because
some people don't like writing a single \ character, autoloading would be a
moot point because if you preference global functions you're implicitly
telling developers they shouldn't write namespaced functions, by making
them harder and less intuitive to use.Cheers
Stephen
I've taken the time to carefully read Ilija's proposal and all followup
messages.
This is a great proposal, Ilija, it will immediately benefit 95%+ userbase.
It looks to me that the pro's outweigh the con's, as well as Ilija having
done good research here already.
As for next steps, I'm suggesting that Ilija reach out to the core team at
Symfony, Zend, Laravel as well as WordPress, Magento, Drupal teams .. for
their consideration and input, and bring it back here/RFC.
The latter group of project's codebases can be quite "quirky", and quite
creative, in how they use PHP compared to a traditional "framework".
We need to understand how this could negatively impact how their systems
are put together, and we definitely want to, upfront, identify anything we
wouldn't normally think of, that they can spot, since they know their own
codebases better than we do.
We already have the composer analysis, so after we have these
framework/project analysis then we can make a strongly informed decision on
the impact, positively or negatively, this change will make.
Many thanks,
Paul
Making relative function names do the opposite of relative class names
sounds like a great way to permanently kill any prospects of
encouraging developers to use regular namespaced functions in place of
static classes as "bag of functions", which is what we keep hearing we
should use - most notably on a recent RFC to embody the concept of a
static class.
That's why I brought up the point about making it easy to explicitly say "relative to current", just as you can explicitly say "relative to global" with a single "".
It's also worth remembering that this whole discussion has no effect on using functions that are defined in a different namespace. The below code might benefit from function autoloading, but would not be helped, hurt, or changed in any way by any of the proposals in this thread:
namespace Acme\Foo\Controller;
use function Acme\StringUtils\better_strlen;
use Acme\StandardUtils as Std;
$foo = Std\generate_something();
$len = better_strlen($foo);
\Acme\Debug\out($len);
Regards,
Rowan Tommins
[IMSoP]
Making relative function names do the opposite of relative class names
sounds like a great way to permanently kill any prospects of
encouraging developers to use regular namespaced functions in place of
static classes as "bag of functions", which is what we keep hearing we
should use - most notably on a recent RFC to embody the concept of a
static class.That's why I brought up the point about making it easy to explicitly say "relative to current", just as you can explicitly say "relative to global" with a single "".
It's also worth remembering that this whole discussion has no effect on using functions that are defined in a different namespace. The below code might benefit from function autoloading, but would not be helped, hurt, or changed in any way by any of the proposals in this thread:
namespace Acme\Foo\Controller;
use function Acme\StringUtils\better_strlen;
use Acme\StandardUtils as Std;$foo = Std\generate_something();
$len = better_strlen($foo);
\Acme\Debug\out($len);Regards,
Rowan Tommins
[IMSoP]
I understand what you are proposing, and I understand that it doesn't affect using other namespaces.
But let's be realistic here: if adopted this would mean developers have to understand and internalise two concepts of how relative-local (or unqualified if you prefer that term) symbol names work, that are completely opposite to each other, essentially forever, unless you expect yet another flop to follow this flip?
This change would also break existing code that does "the right thing", and has the potential to arbitrarily break perfectly valid userland code any time a new global function is added, forever.
And the reason for all this was...
prefixing all calls with
\
, or adding ause function
at the top of every file is annoying and noisy.
For all the handwringing this list does over backwards compatibility breaks, it absolutely astounds me that there is even consideration of such a massive BC break, with the added bonus of guaranteed future BC breaks, to save typing one character per function call (or realistically, configuring your IDE/code linter to do it for you).
This whole discussion is predicated on "people don't use functions much so its not a big BC break" literally at the same time as not one but two different discussions about an RFC for function autoloading, specifically because we're constantly being told that static classes are "wrong" and namespaced functions are "right".
Sorry to reply to the same message twice, but as a concrete example, consider this code:
I realise you replied to me twice, but for the sake of clarity let me be perfectly clear:
I understand absolutely what you are proposing.
I'm saying it's a ridiculous idea (the whole concept, not just your specific "solution") to propose that such a core facet of the language be changed, because some people both "want the absolute best performance" but also "don't like a leading backslash". Hidden BC breaks forever to gain 2-4% performance benefit without typing "". If you told me my calendar is wrong and today is actually April 1st I would 100% believe you based the contents of this thread.
Cheers
Stephen
This change would also break existing code that does "the right thing",
and has the potential to arbitrarily break perfectly valid userland
code any time a new global function is added, forever.
You replied to me, but you seem to be commenting on one of the other proposals. My preference is for "unqualified = global", which is a one-off breaking change, which only affects user-defined functions, which are declared in a namespace, and used in that same namespace.
You're right that it would mean classes and functions resolve differently, and that's why I said that if I had a time machine, I would support a different option. But, personally, I don't think the small long-term inconsistency outweighs the huge short-term disruption of defaulting to local.
Regards,
Rowan Tommins
[IMSoP]
This change would also break existing code that does "the right thing",
and has the potential to arbitrarily break perfectly valid userland
code any time a new global function is added, forever.You replied to me, but you seem to be commenting on one of the other proposals. My preference is for "unqualified = global", which is a one-off breaking change, which only affects user-defined functions, which are declared in a namespace, and used in that same namespace.
You're right that it would mean classes and functions resolve differently, and that's why I said that if I had a time machine, I would support a different option. But, personally, I don't think the small long-term inconsistency outweighs the huge short-term disruption of defaulting to local.
Regards,
Rowan Tommins
[IMSoP]
Ok well I apologise, I thought you were proposing whatever "current namespace" syntax solution as optional, rather required.
I stand by the rest of my argument though. This entire ridiculous discussion about a huge BC break that introduces bizarre inconsistencies, is 100% because a handful of people don't want to type \
.
Are we going to go through this whole ridiculous dance for classes and interfaces and traits next too? They always need to be prepended with \
(or imported), so I can't begin to imagine what horrors that must be presenting for those poor souls who are allergic to a leading backslash.
Perhaps next we can go back to having register_globals and forcing it to always on, because people don't want to type $_GET or $_POST. Perhaps after that we can bring back magic quotes because parameterised queries are too much to type?
I stand by the rest of my argument though. This entire ridiculous discussion about a huge BC break that introduces bizarre inconsistencies, is 100% because a handful of people don't want to type
\
.
Again, I'm not sure which straw man you're attacking. The largest BC break would be to require users to type the leading backslash, in exchange for removing the current "bizarre inconsistencies" and making functions resolve the same way as classes.
Other proposals aim to shift that balance - leaving some inconsistency, but less compatibility break.
And most users don't object to using a leading backslash, they just (quite reasonably) have no idea what impact it has on the ability of the engine to optimise their code.
Regards,
Rowan Tommins
[IMSoP]
And most users don't object to using a leading backslash, they just (quite reasonably) have no idea what impact it has on the ability of the engine to optimise their code.
I think this is a misread, and I don't think you can argue that there is a clear understanding of "most users" here. While I admit I too don't have real data to back this up, I would be more likely to believe "most users" would absolutely object to being forced into using a leading backslash and would find it non-sensical that you must add a backslash for the engine to do the "right thing" (in this case, optimize their code with a security benefit), vs just doing the right thing by default.
And most users don't object to using a leading backslash, they just (quite reasonably) have no idea what impact it has on the ability of the engine to optimise their code.
I think this is a misread, and I don't think you can argue that there is a clear understanding of "most users" here. While I admit I too don't have real data to back this up, I would be more likely to believe "most users" would absolutely object to being forced into using a leading backslash and would find it non-sensical that you must add a backslash for the engine to do the "right thing" (in this case, optimize their code with a security benefit), vs just doing the right thing by default.
Hi John,
Now please don't misunderstand me, I am not advocating for a change to drop the fallback to global symbols. I wouldn't be against such a change but I don't think it's a problem that needs to be solved right now.
But if that were the proposal, I have to ask about something you said:
would find it non-sensical that you must add a backslash for the engine to do the "right thing" (in this case, optimize their code with a security benefit), vs just doing the right thing by default.
What do you mean by this? What is "the right thing"?
Are you saying that you think people would expect an unqualified function name to automatically act like a global function name, and never check for a local namespaced function of the same name? Just to be clear, you understand that is the literal opposite of how classes/traits/interfaces work in PHP, yes?
I'm not saying that isn't necessarily how people think, I have literally zero data about this besides my own thoughts, but it seems like a bizarre idea that people would expect function name resolution to work completely opposite to how class/class-like name resolution works.
Cheers
Stephen
would find it non-sensical that you must add a backslash for the engine to do the "right thing" (in this case, optimize their code with a security benefit), vs just doing the right thing by default.
What do you mean by this? What is "the right thing"?
I mean this:
<?php
// something.php
namespace App\Models;
function password_hash(string $password, string|int|null $algo, array $options = []): string
{
print("Hello");
return $password;
}
<?php
// my code
namespace App\Models;
include "something.php";
password_hash('foobar', PASSWORD_DEFAULT);
This code IMO shouldn't print "Hello", but it does. The current behavior of looking up the local namespace first for functions, instead of the global namespace first, IMO is the "wrong thing" because it expects developers to fully qualify internal function calls every single time because, under very specific legitimate use cases that are pretty rare in the course of normal development, you actually might want to do that.
I'm not saying that isn't necessarily how people think, I have literally zero data about this besides my own thoughts, but it seems like a bizarre idea that people would expect function name resolution to work completely opposite to how class/class-like name resolution works.
At it's core a vast majority of the functionality of the PHP language exists within internally-implemented functions, not classes. So yes, I think it's entirely reasonable that people would expect that internal functions resolve at a higher priority than user-defined functions with the same name, and that if you'd like to reuse a global namespaced function in your local namespace you need to be explicit about that -- not the other way around.
What do you mean by this? What is "the right thing"?
Also, faster code vs. slower code by default is "the right thing" too.
I stand by the rest of my argument though. This entire ridiculous discussion about a huge BC break that introduces bizarre inconsistencies, is 100% because a handful of people don't want to type
\
.Again, I'm not sure which straw man you're attacking.
I'm not attacking any straw man. I'm calling out the absolutely absurdity of breaking code that has worked for 15 years based on the claim that some people don't want to type a leading backslash. That isn't my claim, that's the claim of the original email.
The largest BC break would be to require users to type the leading backslash,
So don't then. If people want to rely on the exist fallback as it has existed for 15 years, and don't care about the performance penalty, let them.
in exchange for removing the current "bizarre inconsistencies" and making functions resolve the same way as classes.
The current inconsistencies between symbol types can be avoided in userland in a 100% consistent way. Import or qualify the symbols you use, all the time, and you have 0 inconsistencies or bizarreness in terms of what it used when.
Regardless of the specific flavour, swapping the lookup order for some symbols to look for global symbols when making unqualified references introduces a hard inconsistency that cannot be rectified in userland, and some flavours (i.e. the original proposal) introduce ongoing-forever BC breakage issues.
Other proposals aim to shift that balance - leaving some inconsistency, but less compatibility break.
Great, how about the solution that doesn't have any BC, and works in every version back to 5.3?
And most users don't object to using a leading backslash
Once again, I didn't claim most users object. I was specifically pointing out that a small number of people complaining about this is a ridiculous reason to even consider the change. Hell the original issue that Ilija referenced makes this outrageous claim, so perhaps direct your "most users" response at the person claiming to represent the views of all users on the planet:
All PHP projects in the World are a bit slower than they could be...
, they just (quite reasonably) have no idea what impact it has on the ability of the engine to optimise their code.
Great, so then we can resolve this whole thing by adding a footnote to the "Name resolution rules" page in the manual that (a) recommends using qualified names (i.e. prefix with a \
) and (b) provides deeper details of the reasons for those who care.
Regards,
Rowan Tommins
[IMSoP]
The current inconsistencies between symbol types can be avoided in userland in a 100% consistent way. Import or qualify the symbols you use, all the time, and you have 0 inconsistencies or bizarreness in terms of what it used when.
So are you essentially arguing that we should put the burden on the majority of users, most of whom (documented by us or not) likely will have no idea what the problem is or potential consequences are? BC breaks happen. While I am all for avoiding BC breaks when possible, sometimes they make sense -- and I think this is a clear example of when it does.
I think you are exaggerating the impact of the BC break here. In fact Ilija measured the impact on the top 1000 composer packages:
https://gist.github.com/iluuu1994/4b83481baac563f8f0d3204c697c5551
I was specifically pointing out that a small number of people complaining about this is a ridiculous reason to even consider the change.
That's one take. Another take is this is an easy win for a few percentage points bump in speed, with improved supply-chain security for composer packages that has a minimal impact on users.
Great, how about the solution that doesn't have any BC, and works in every version back to 5.3?
By this logic, we should never introduce BC breaks.
Great, so then we can resolve this whole thing by adding a footnote to the "Name resolution rules" page in the manual that (a) recommends using qualified names (i.e. prefix with a\
) and (b) provides deeper details of the reasons for those who care.
From the perspective of program language design (which is what we're talking about here), the goal is to create a language that helps the developer do something faster/better/easier, not do the wrong thing (slower code, etc.) by default and dump the responsibly for that on developers by expecting them to read a footnote buried in a doc. Especially when the justification is because there's concerns that code written in 2009 won't work anymore.
The current inconsistencies between symbol types can be avoided in userland in a 100% consistent way. Import or qualify the symbols you use, all the time, and you have 0 inconsistencies or bizarreness in terms of what it used when.
So are you essentially arguing that we should put the burden on the majority of users, most of whom (documented by us or not) likely will have no idea what the problem is or potential consequences are?
No, I'm saying that the "problem" of performance has had a pretty simple, consistent solution since namespaces were added to the language, and that for the vast, vast majority of projects it's a stretch to even classify it as a problem.
The claims about "security" because a function you defined (or included via a package) is resolved in place of a global one are irrelevant. If you're including compromised code in your project, all bets are off.
BC breaks happen. While I am all for avoiding BC breaks when possible, sometimes they make sense -- and I think this is a clear example of when it does.
Please be specific what you mean by "this". The original proposal by Ilija provides a constant BC break over time, whenever a new global function is introduced.
I think you are exaggerating the impact of the BC break here. In fact Ilija measured the impact on the top 1000 composer packages:
https://gist.github.com/iluuu1994/4b83481baac563f8f0d3204c697c5551
Great, so 0.24% of public packages represented, and 0% of private code represented. That certainly seems representative.
You've also missed the other aspect here, which I mentioned earlier: namespaced function usage is low because the language hasn't traditionally supported it anywhere near as well as namespaced classes. There have been multiple people proclaiming recently that "static utility classes" are the 'wrong' approach, that people should use namespaced functions in their code. There are two active RFCs about function autoloading.
This change would at best, make those functions slower to use within the same namespace, and at worst, more work, with a brand new inconsistency, to use within the same namespace.
I was specifically pointing out that a small number of people complaining about this is a ridiculous reason to even consider the change.
That's one take. Another take is this is an easy win for a few percentage points bump in speed, with improved supply-chain security for composer packages that has a minimal impact on users.
I was clarifying (to someone else) that the claim about who objects or doesn't, was never mine. It's a bit weird that in one email you admit you have no data, and in the next claim "minimal impact on users". Either you have data or you don't.
Great, how about the solution that doesn't have any BC, and works in every version back to 5.3?
By this logic, we should never introduce BC breaks.
We should aim to reduce BC breaks as much as possible, and especially BC breaks that have an ongoing impact over time (i.e. new breaks into the future). The point is that every single technical problem pointed out in the original issue, and the email that arose from it, can be solved, and could be soled 15 years ago, by using a \
for global functions (or using a use
statement), exactly the same way you do with global classes and interfaces.
Great, so then we can resolve this whole thing by adding a footnote to the "Name resolution rules" page in the manual that (a) recommends using qualified names (i.e. prefix with a
\
) and (b) provides deeper details of the reasons for those who care.From the perspective of program language design (which is what we're talking about here), the goal is to create a language that helps the developer do something faster/better/easier, not do the wrong thing (slower code, etc.) by default and dump the responsibly for that on developers by expecting them to read a footnote buried in a doc. Especially when the justification is because there's concerns that code written in 2009 won't work anymore.
To be clear, we aren't "creating" a language. We're talking about a hypothetical change to a core aspect of an existing language, that is used by literally millions of developers around the planet.
The change we're talking about is in the range of maybe 2-4%, and is 100% solvable in userland - and has been for those 15 years, in a way that has zero impact on developers using the language to write their own functions, and is consistent with the way other symbol lookups (e.g. classes) work. I'll concede you one point. A footnote is clearly not important enough for a 2% performance benefit. Let's make it the subtext on the header of ever php.net http://php.net/ page, just to make sure people know.
I mean this:
I'm honestly not even sure where to begin here. If you add a namespaced function to your code, and call it from within that namespace, it will run. That's literally by design. If that is somehow surprising to you, I'd suggest the aforementioned name resolution page in the php manual. It's not exactly long, you can probably read it quicker than this email.
As I and others have said: if your project has a credible security risk because of this functionality, you have bigger problems than needing to use a leading backslash.
At it's core a vast majority of the functionality of the PHP language exists within internally-implemented functions, not classes.
There's a lot of procedural APIs in the standard library/extensions, sure. But people still use classes a lot though, and there is in general a push towards more OOP API's and less groups of functions - particularly for anything with state (e.g. see recent discussions and RFC's about Curl objects, HTTP Request data objects, BCMath Number object, Tokenizer, etc).
So yes, I think it's entirely reasonable that people would expect that internal functions resolve at a higher priority than user-defined functions with the same name
OK, you can think that. I don't agree. One of the top, if not the top thing people complain about PHP is inconsistencies in the standard library - the order of needle/haystack arguments being different in string vs array functions is probably one of the most well known.
The claims about "security" because a function you defined (or included via a package) is resolved in place of a global one are irrelevant. If you're including compromised code in your project, all bets are off.
I have plenty of experience behind why I disagree with your dismissal here, but I'm not going to debate it with you.BC breaks happen. While I am all for avoiding BC breaks when possible, sometimes they make sense -- and I think this is a clear example of when it does.
Please be specific what you mean by "this". The original proposal by Ilija provides a constant BC break over time, whenever a new global function is introduced.
I don't think we're piling in functions all the time here -- and that's why we have deprecation notices when we do so we can warn users to rename a conflicting function.Great, so 0.24% of public packages represented, and 0% of private code represented. That certainly seems representative.
Honestly, statistically it actually is fairly meaningful and representative. I have serious doubts if you went from 1000 to 10000 you'd see much change (and would welcome that information if I'm wrong).You've also missed the other aspect here, which I mentioned earlier: namespaced function usage is low because the language hasn't traditionally supported it anywhere near as well as namespaced classes. There have been multiple people proclaiming recently that "static utility classes" are the 'wrong' approach, that people should use namespaced functions in their code. There are two active RFCs about function autoloading.
This change would at best, make those functions slower to use within the same namespace, and at worst, more work, with a brand new inconsistency, to use within the same namespace.
The fact that functions have not been widely embraced as you argue would be an argument for having this debate now, rather than later after further adding to the complexity.The change we're talking about is in the range of maybe 2-4%, and is 100% solvable in userland - and has been for those 15 years, in a way that has zero impact on developers using the language to write their own functions, and is consistent with the way other symbol lookups (e.g. classes) work. I'll concede you one point. A footnote is clearly not important enough for a 2% performance benefit. Let's make it the subtext on the header of ever php.net (http://php.net) page, just to make sure people know.
By this logic we shouldn't have touched list , nor should we ever add any functionality that can't be implemented in userspace using existing tools. E.g. array_column
I'm honestly not even sure where to begin here. If you add a namespaced function to your code, and call it from within that namespace, it will run. That's literally by design. If that is somehow surprising to you, I'd suggest the aforementioned name resolution page in the php manual. It's not exactly long, you can probably read it quicker than this email.
You say it's by design, I say it's a bad design and should be fixed.
The claims about "security" because a function you defined (or included via a package) is resolved in place of a global one are irrelevant. If you're including compromised code in your project, all bets are off.
I have plenty of experience behind why I disagree with your dismissal here, but I'm not going to debate it with you.
If you're going to make a claim, but then refuse to back it up with any kind of evidence when challenged, myself and most others are going to ignore your claim, FYI.
BC breaks happen. While I am all for avoiding BC breaks when possible, sometimes they make sense -- and I think this is a clear example of when it does.
Please be specific what you mean by "this". The original proposal by Ilija provides a constant BC break over time, whenever a new global function is introduced.
I don't think we're piling in functions all the time here -- and that's why we have deprecation notices when we do so we can warn users to rename a conflicting function.
How do you have a deprecation notice for a new function before it exists? If you took this approach, any new global functions would have to have an RFC and then wait until the next major version to actually be implemented - so for example, str_contains, str_starts_with and str_ends_with were all added in PHP8.0, back in 2020 - four years ago. But because the RFC's were all conducted after the release of 7.4 (the last 7.x release), none of those functions would be eligible to introduce into the language yet, because a deprecation notice introduced with 8.0 won't enforce it's warned change until 9.0, which we don't even have a date for yet.
Taking this approach, the following functions would currently still be waiting in limbo for 9.0 to arrive and become available:
str_contains
str_starts_with
str_ends_with
get_debug_type
fdiv
array_is_list
memory_reset_peak_usage
ini_parse_quantity
str_increment
str_decrement
stream_context_set_options
And that's just considering the "standard" or "core" functions - if you include extensions that are considered "core" there's a heap more.
Great, so 0.24% of public packages represented, and 0% of private code represented. That certainly seems representative.
Honestly, statistically it actually is fairly meaningful and representative. I have serious doubts if you went from 1000 to 10000 you'd see much change (and would welcome that information if I'm wrong).
You've also missed the other aspect here, which I mentioned earlier: namespaced function usage is low because the language hasn't traditionally supported it anywhere near as well as namespaced classes. There have been multiple people proclaiming recently that "static utility classes" are the 'wrong' approach, that people should use namespaced functions in their code. There are two active RFCs about function autoloading.
This change would at best, make those functions slower to use within the same namespace, and at worst, more work, with a brand new inconsistency, to use within the same namespace.
The fact that functions have not been widely embraced as you argue would be an argument for having this debate now, rather than later after further adding to the complexity.
That flies in the face of relatively common logic taken on most RFCs these days, where potential impact on future features is a deliberate consideration, so as not to negatively impact them, rather than an attempt to "get in first" regardless of negative consequences down the line.
The change we're talking about is in the range of maybe 2-4%, and is 100% solvable in userland - and has been for those 15 years, in a way that has zero impact on developers using the language to write their own functions, and is consistent with the way other symbol lookups (e.g. classes) work. I'll concede you one point. A footnote is clearly not important enough for a 2% performance benefit. Let's make it the subtext on the header of ever php.net http://php.net/ page, just to make sure people know.
By this logic we shouldn't have touched list , nor should we ever add any functionality that can't be implemented in userspace using existing tools. E.g. array_column
Are you referring to the [$a, $b] = $array;
syntactic sugar? You can't implement that in userland, and it didn't introduce any BC breaks, so I don't see how it's relevant? array_column is considered to be pretty common functionality, and the C version is about 5x faster than the equivalent userland implementation (comparing against the polypill the author of array_column provided) under php8.3, and was about 6x faster under php5.5 when it was introduced.
I'm honestly not even sure where to begin here. If you add a namespaced function to your code, and call it from within that namespace, it will run. That's literally by design. If that is somehow surprising to you, I'd suggest the aforementioned name resolution page in the php manual. It's not exactly long, you can probably read it quicker than this email.
You say it's by design, I say it's a bad design and should be fixed.
Can you give an example of any other language where global symbols take precedence over locally defined symbols of the same name?
On Fri, Aug 23, 2024 at 5:49 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:
Other proposals aim to shift that balance - leaving some inconsistency, but less compatibility break.
And most users don't object to using a leading backslash, they just (quite reasonably) have no idea what impact it has on the ability of the engine to optimise their code.
For some context: Before proposing this change, I asked Symfony if
they were interested in disambiguating calls to improve performance,
given I did some work to make fully qualified calls faster. But they
were not, stating that the change would be too verbose for their
liking.
Making unqualified calls to mean local would force Symfony into making
the change, which is not the approach I'm interested in taking. Making
them global would likely reduce breakage by much, but not nearly as
much as keeping the fallback.
From reading the responses, it seems we have three primary camps:
- People who don't think BC is a problem, and would like to drop
either the global or local lookup entirely, requiring disambiguation. - People who do think BC is a problem, and would like some opt-in
mechanism to explicitly pick global or local scope. - People who aren't convinced that the performance improvements are
worth it to begin with, or that the developers themselves are
responsible for disambiguation.
IMO, 1. is too drastic. As people have mentioned, there are tools to
automate disambiguation. But unless we gain some other benefit from
dropping the lookup entirely, why do it? Consistency with class
lookups is a factor, but is it enough to break a large portion of
codebases? The summed up time of every maintainer installing and
running a tool that modifies a large portion of the codebase, and then
dealing with conflicts in existing branches is not miniscule. Fixing
local calls will also require context from other files to correctly
disambiguate. I'm not aware if any tools actually consider context, or
just take the naive approach of making known, internal calls global,
and leaving the rest.
- misses the point of the immediate performance gains without
modifications to the codebase. Even if the disambiguation itself is a
one-liner, it still needs to be added to every codebase and every
file, and still requires fixing actual local calls that may be made
within the same file.
I obviously also disagree with 3. as I wouldn't have sent this
proposal otherwise. :) Performance improvements are hard to come by
nowadays. It was measured on real codebases (Symfony and Laravel).
Ilija
I obviously also disagree with 3. as I wouldn't have sent this
proposal otherwise. :) Performance improvements are hard to come by
nowadays. It was measured on real codebases (Symfony and Laravel).
Hi Ilija,
Just to make sure I'm not misunderstanding something here - Symphony and Laravel (or anyone else) can get the exact same performance benefit you're talking about, if they use either a leafing \
or use function...
in their own code, today, correct?
I'm not going to pretend to know how many files either project has that uses global functions, but neither of those two existing options seems particularly "hard to come by" in my mind.
Cheers
Stephen
I obviously also disagree with 3. as I wouldn't have sent this
proposal otherwise. :) Performance improvements are hard to come by
nowadays. It was measured on real codebases (Symfony and Laravel).Just to make sure I'm not misunderstanding something here - Symphony and Laravel (or anyone else) can get the exact same performance benefit you're talking about, if they use either a leafing
\
oruse function...
in their own code, today, correct?
As I wrote in my e-mail:
For some context: Before proposing this change, I asked Symfony if
they were interested in disambiguating calls to improve performance,
given I did some work to make fully qualified calls faster. But they
were not, stating that the change would be too verbose for their
liking.Making unqualified calls to mean local would force Symfony into making
the change, which is not the approach I'm interested in taking.
Yes, the full performance benefits can be achieved by prefixing your
entire codebase, which includes not only your own code but also the
code of your framework. How much you benefit from just converting your
own code of course depends on how much you rely on vendor code.
I'm not going to pretend to know how many files either project has that uses global functions, but neither of those two existing options seems particularly "hard to come by" in my mind.
The "hard to come by" part is referring to the engine, which is quite
optimized for the current semantics. Some of PHPs quirky semantics
make it hard to improve it further, this being one of them.
The "hard to come by" part is referring to the engine, which is quite
optimized for the current semantics. Some of PHPs quirky semantics
make it hard to improve it further, this being one of them.
Thanks for clarifying. Out of curiosity, how much optimisation do you imagine would be possible if the lookups were done the same was as classes (ie no fallback, names must be local, qualified or imported with use
)?
I am aware this is a BC break. But if it's kosher to discuss introducing a never ending BC break I don't see why this isn't a valid discussion either. It would give everyone that elusive 2-4% performance boost, would resolve any ambiguity about which function a person intended to call (the claimed security issue) and would bring consistency with the way classes/etc are referenced.
Cheers
Stephen
Hi Stephen
Thanks for clarifying. Out of curiosity, how much optimisation do you imagine would be possible if the lookups were done the same was as classes (ie no fallback, names must be local, qualified or imported with
use
)?
I haven't measured this case specifically, but if unqualified calls to
local functions are indeed rare (which the last analysis seems to
indicate), then it should make barely any difference. Of course, if
your code makes lots of use of them, then the story might be
different. That said, the penalty of an ambiguous internal call is
much higher than that of a user, local call, given that internal calls
sometimes have special optimizations or can even be entirely executed
at compile time. For local calls, it will simply lead to a double
lookup on first execution.
I am aware this is a BC break. But if it's kosher to discuss introducing a never ending BC break I don't see why this isn't a valid discussion either. It would give everyone that elusive 2-4% performance boost, would resolve any ambiguity about which function a person intended to call (the claimed security issue) and would bring consistency with the way classes/etc are referenced.
From my analysis, there were 2 967 unqualified calls to local
functions in the top 1 000 repositories. (Disclaimer: There might be a
"use function" at the top for some of these, the analysis isn't that
sophisticated.)
I also ran the script to check for unqualified calls to global
functions (or at least functions that weren't statically visible in
that scope in any of the repositories files), and there were ~139 000
of them. It seems like this is quite a different beast. To summarize:
- Flipping lookup order: ~a few dozens of changes
- Global only: ~3 000 changes
- Local only: ~139 000 changes
While much of this can be automated, huge diffs still require
reviewing time, and can lead to many merge conflicts which also take
time to resolve. I would definitely prefer to go with 1. or
potentially 2.
Ilija
Hi Stephen
Thanks for clarifying. Out of curiosity, how much optimisation do you imagine would be possible if the lookups were done the same was as classes (ie no fallback, names must be local, qualified or imported with
use
)?I haven't measured this case specifically, but if unqualified calls to
local functions are indeed rare (which the last analysis seems to
indicate), then it should make barely any difference. Of course, if
your code makes lots of use of them, then the story might be
different. That said, the penalty of an ambiguous internal call is
much higher than that of a user, local call, given that internal calls
sometimes have special optimizations or can even be entirely executed
at compile time. For local calls, it will simply lead to a double
lookup on first execution.I am aware this is a BC break. But if it's kosher to discuss introducing a never ending BC break I don't see why this isn't a valid discussion either. It would give everyone that elusive 2-4% performance boost, would resolve any ambiguity about which function a person intended to call (the claimed security issue) and would bring consistency with the way classes/etc are referenced.
From my analysis, there were 2 967 unqualified calls to local
functions in the top 1 000 repositories. (Disclaimer: There might be a
"use function" at the top for some of these, the analysis isn't that
sophisticated.)I also ran the script to check for unqualified calls to global
functions (or at least functions that weren't statically visible in
that scope in any of the repositories files), and there were ~139 000
of them. It seems like this is quite a different beast. To summarize:
- Flipping lookup order: ~a few dozens of changes
- Global only: ~3 000 changes
- Local only: ~139 000 changes
While much of this can be automated, huge diffs still require
reviewing time, and can lead to many merge conflicts which also take
time to resolve. I would definitely prefer to go with 1. or
potentially 2.Ilija
Hi Ilija,
I understand that a change like (3) is a huge BC break, and as I said earlier, I wasn't actually suggesting that is the action to take, because I don't think there is sufficient reason to take any action. But given that some people in this thread seem convinced that a change to functionality is apparently required, I do think every potential change, and it's pros and cons, should be discussed.
As I've said numerous times, and been either outright dismissed or ignored: there has been a consistent push from a non-trivial number of internals members that userland developers should make better use of regular functions, rather than using classes as fancy namespaces. There was a recent RFC vote that implicitly endorsed this opinion.
Right now, the lookup rules make namespaced regular functions a consistent experience for developers, but the lack of autoload makes it unpopular, and the lack of visibility for such symbols can be problematic.
With the change you're proposing, there will be another hurdle that makes the use of regular namespaced functions harder/less intuitive, or potentially (with option 1) unpredictable over PHP versions, due to the constant threat of BC breaks due to new builtin functions - right when we have not one but two RFCs for function autoloading (arguably the biggest barrier to their increased usage in userland).
So the whole reason I asked about (3) is because it would
- (a) bring consistency with class/interface/trait symbols;
- (b) inherently bring the much desired 2% performance boost for function calls, because people would be forced to qualify the names;
- (c) have zero risk of of future WTF BC break when a new global function interrupting local function lookups;
- (d) have no need for a new "simpler" qualifying syntax (you can't get shorter than 1 character);
- (e) presumably simplify function autoloading, because there's no longer any "fallback" step to worry about before triggering an autoloader;
- (e) even solve the "security" concerns John raised, because the developer would be forced to qualify their usage if they wanted to use the builtin function - their intent is always explicit, never guessed.
Yes, it is a huge BC break in terms of the amount of code that's affected. But it's almost certainly one of the simplest BC break to "fix" in the history of PHP BC breaks.
How much code was affected when register globals was removed? Or when magic quotes was removed? Or when short codes were removed?
Surely any change being proposed here would mean a deprecation notice in the next release after 8.4, and then whatever actual change is proposed, in the next major version after that. So possibly 8.5 and then 9.0, but potentially 9.0 and then 10.0.
If either of (1) or (2) is chosen, and the "acceptability" of such a choice depends on something less verbose than "namespace" to qualify a local function, projects literally can't future proof (or no-deprecation-notice-proof, if you prefer) their code against the eventual flip until that change is implemented - in a scenario where a deprecation (and new local qualifier) goes out in 2025 as part of 8.5, and a flip happens in 2026 as part of 9.0, that would cuts the time projects have to effectively adapt, in half, and it means any code that's updated for it, can't make use of the new "less verbose" local qualifier if they also need to support versions prior to it being available.
If it happened to be 9.0 and 10.0 being the deprecation and "change" versions, obviously people have longer to make the required change - but that argument cuts both ways. If you have 5 years to change every strlen
to \strlen
it's hardly going to cause a huge and sudden swath of noise in revision history. I would imagine most projects would just adopt a new code style, and prefixing with \
would occur automatically whenever a file is otherwise modified by a developer.
There's also an impact on internals development/RFC with either (1) or (2): any proposed new global function in the standard library now has a BC barrier to pass if it might conflict with one defined by anyone in userland, in any namespace. JS is a living embodiment of of this problem: see String#includes, Array#includes, and Array#flat - and that's with people doing the "wrong thing" (extending builtin JS prototypes is arguably the same as using the \php
namespace)
Multiple people have lamented the way function fallbacks were originally implemented. If you're going to insist on making a change, let's at least aim for a change that brings MORE consistency, and fixes a previous mistake, rather than adding a brand new inconsistency and who knows how many years of unexpected BC breaks for unsuspecting userland developers - who apparently already stuggle to understand the way symbol lookup happens - into the future, and adding yet another reason for people to not use namespaced functions.
Cheers
Stephen
Hi Stephen
Thanks for clarifying. Out of curiosity, how much optimisation do you imagine would be possible if the lookups were done the same was as classes (ie no fallback, names must be local, qualified or imported with
use
)?I haven't measured this case specifically, but if unqualified calls to
local functions are indeed rare (which the last analysis seems to
indicate), then it should make barely any difference. Of course, if
your code makes lots of use of them, then the story might be
different. That said, the penalty of an ambiguous internal call is
much higher than that of a user, local call, given that internal calls
sometimes have special optimizations or can even be entirely executed
at compile time. For local calls, it will simply lead to a double
lookup on first execution.I am aware this is a BC break. But if it's kosher to discuss introducing a never ending BC break I don't see why this isn't a valid discussion either. It would give everyone that elusive 2-4% performance boost, would resolve any ambiguity about which function a person intended to call (the claimed security issue) and would bring consistency with the way classes/etc are referenced.
From my analysis, there were 2 967 unqualified calls to local
functions in the top 1 000 repositories. (Disclaimer: There might be a
"use function" at the top for some of these, the analysis isn't that
sophisticated.)I also ran the script to check for unqualified calls to global
functions (or at least functions that weren't statically visible in
that scope in any of the repositories files), and there were ~139 000
of them. It seems like this is quite a different beast. To summarize:
- Flipping lookup order: ~a few dozens of changes
- Global only: ~3 000 changes
- Local only: ~139 000 changes
While much of this can be automated, huge diffs still require
reviewing time, and can lead to many merge conflicts which also take
time to resolve. I would definitely prefer to go with 1. or
potentially 2.Ilija
Hi Ilija,
I understand that a change like (3) is a huge BC break, and as I said earlier, I wasn't actually suggesting that is the action to take, because I don't think there is sufficient reason to take any action. But given that some people in this thread seem convinced that a change to functionality is apparently required, I do think every potential change, and it's pros and cons, should be discussed.
As I've said numerous times, and been either outright dismissed or ignored: there has been a consistent push from a non-trivial number of internals members that userland developers should make better use of regular functions, rather than using classes as fancy namespaces. There was a recent RFC vote that implicitly endorsed this opinion.
Right now, the lookup rules make namespaced regular functions a consistent experience for developers, but the lack of autoload makes it unpopular, and the lack of visibility for such symbols can be problematic.
With the change you're proposing, there will be another hurdle that makes the use of regular namespaced functions harder/less intuitive, or potentially (with option 1) unpredictable over PHP versions, due to the constant threat of BC breaks due to new builtin functions - right when we have not one but two RFCs for function autoloading (arguably the biggest barrier to their increased usage in userland).
So the whole reason I asked about (3) is because it would
- (a) bring consistency with class/interface/trait symbols;
- (b) inherently bring the much desired 2% performance boost for function calls, because people would be forced to qualify the names;
- (c) have zero risk of of future WTF BC break when a new global function interrupting local function lookups;
- (d) have no need for a new "simpler" qualifying syntax (you can't get shorter than 1 character);
- (e) presumably simplify function autoloading, because there's no longer any "fallback" step to worry about before triggering an autoloader;
- (e) even solve the "security" concerns John raised, because the developer would be forced to qualify their usage if they wanted to use the builtin function - their intent is always explicit, never guessed.
Yes, it is a huge BC break in terms of the amount of code that's affected. But it's almost certainly one of the simplest BC break to "fix" in the history of PHP BC breaks.
How much code was affected when register globals was removed? Or when magic quotes was removed? Or when short codes were removed?
Surely any change being proposed here would mean a deprecation notice in the next release after 8.4, and then whatever actual change is proposed, in the next major version after that. So possibly 8.5 and then 9.0, but potentially 9.0 and then 10.0.
If either of (1) or (2) is chosen, and the "acceptability" of such a choice depends on something less verbose than "namespace" to qualify a local function, projects literally can't future proof (or no-deprecation-notice-proof, if you prefer) their code against the eventual flip until that change is implemented - in a scenario where a deprecation (and new local qualifier) goes out in 2025 as part of 8.5, and a flip happens in 2026 as part of 9.0, that would cuts the time projects have to effectively adapt, in half, and it means any code that's updated for it, can't make use of the new "less verbose" local qualifier if they also need to support versions prior to it being available.
If it happened to be 9.0 and 10.0 being the deprecation and "change" versions, obviously people have longer to make the required change - but that argument cuts both ways. If you have 5 years to change every
strlen
to\strlen
it's hardly going to cause a huge and sudden swath of noise in revision history. I would imagine most projects would just adopt a new code style, and prefixing with\
would occur automatically whenever a file is otherwise modified by a developer.There's also an impact on internals development/RFC with either (1) or (2): any proposed new global function in the standard library now has a BC barrier to pass if it might conflict with one defined by anyone in userland, in any namespace. JS is a living embodiment of of this problem: see String#includes, Array#includes, and Array#flat - and that's with people doing the "wrong thing" (extending builtin JS prototypes is arguably the same as using the
\php
namespace)Multiple people have lamented the way function fallbacks were originally implemented. If you're going to insist on making a change, let's at least aim for a change that brings MORE consistency, and fixes a previous mistake, rather than adding a brand new inconsistency and who knows how many years of unexpected BC breaks for unsuspecting userland developers - who apparently already stuggle to understand the way symbol lookup happens - into the future, and adding yet another reason for people to not use namespaced functions.
Cheers
Stephen
It may be worth waiting for function autoloading, to be honest. One of the nice things about it is that you get called when using non-qualified globals. This makes it very easy for an autoloader to start forcing qualified globals and emitting warnings/exceptions. I have a feeling that, eventually, if function autoloading gets more use and accepted into php, we will see people using more and more qualified globals.
Ergo, I suspect option (3) will become the default, eventually. Unless (2) is chosen, of course.
— Rob
- People who don't think BC is a problem, and would like to drop
either the global or local lookup entirely, requiring disambiguation.
There is also an option of swapping the priority, making local lookups secondary to global lookups -- and to override that behavior you would require disambiguation. It doesn't have to mean drop the lookup entirely.
- People who don't think BC is a problem, and would like to drop
either the global or local lookup entirely, requiring disambiguation.There is also an option of swapping the priority, making local lookups secondary to global lookups -- and to override that behavior you would require disambiguation. It doesn't have to mean drop the lookup entirely.
Right, that's my proposal. My point was that few (nobody except you
and me, I believe) showed particular excitement for this approach.
On Fri, Aug 23, 2024 at 8:19 PM John Coggeshall john@coggeshall.org wrote: > > 1. People who don't think BC is a problem, and would like to drop > either the global or local lookup entirely, requiring disambiguation. > > There is also an option of swapping the priority, making local lookups secondary to global lookups -- and to override that behavior you would require disambiguation. It doesn't have to mean drop the lookup entirely. Right, that's my proposal. My point was that few (nobody except you and me, I believe) showed particular excitement for this approach.
There is also the 5th (or 4th?) option of making built-ins special and not require them nor locals be prefixed, and instead require global userland functions to be prefixed.
-Mike
On Fri, Aug 23, 2024 at 5:49 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:Other proposals aim to shift that balance - leaving some inconsistency, but less compatibility break.
And most users don't object to using a leading backslash, they just (quite reasonably) have no idea what impact it has on the ability of the engine to optimise their code.
For some context: Before proposing this change, I asked Symfony if
they were interested in disambiguating calls to improve performance,
given I did some work to make fully qualified calls faster. But they
were not, stating that the change would be too verbose for their
liking.
I think if someone values code beauty more than speed, they can do that. Although, I find that rather hilarious when their code base is littered with goto, “for speed.”
What it really sounds like is that they realized you would just change the language and they wouldn’t have to review those changes… /s
Making unqualified calls to mean local would force Symfony into making
the change, which is not the approach I'm interested in taking. Making
them global would likely reduce breakage by much, but not nearly as
much as keeping the fallback.
I don’t think changing the language for a specific framework(s) is a good idea.
From reading the responses, it seems we have three primary camps:
- People who don't think BC is a problem, and would like to drop
either the global or local lookup entirely, requiring disambiguation.- People who do think BC is a problem, and would like some opt-in
mechanism to explicitly pick global or local scope.- People who aren't convinced that the performance improvements are
worth it to begin with, or that the developers themselves are
responsible for disambiguation.IMO, 1. is too drastic. As people have mentioned, there are tools to
automate disambiguation. But unless we gain some other benefit from
dropping the lookup entirely, why do it? Consistency with class
lookups is a factor, but is it enough to break a large portion of
codebases? The summed up time of every maintainer installing and
running a tool that modifies a large portion of the codebase, and then
dealing with conflicts in existing branches is not miniscule. Fixing
local calls will also require context from other files to correctly
disambiguate. I'm not aware if any tools actually consider context, or
just take the naive approach of making known, internal calls global,
and leaving the rest.
Aren’t we doing that anyway with your proposal? Sure, maybe that doesn’t require (much) changes, right now. But there is an RFC being discussed right now which introduces a new function called “parse_html”
This seems like a super generic name that did not take global-first into account. I know for a fact I have seen code with that exact function name several times in my career.
- misses the point of the immediate performance gains without
modifications to the codebase. Even if the disambiguation itself is a
one-liner, it still needs to be added to every codebase and every
file, and still requires fixing actual local calls that may be made
within the same file.I obviously also disagree with 3. as I wouldn't have sent this
proposal otherwise. :) Performance improvements are hard to come by
nowadays. It was measured on real codebases (Symfony and Laravel).Ilija
Applications are more likely to get better performance gains in symfony by uninstalling doctrine and writing optimized queries, to be completely honest.
— Rob
IMO, 1. is too drastic. As people have mentioned, there are tools to
automate disambiguation. But unless we gain some other benefit from
dropping the lookup entirely, why do it?
I can think of a few disadvantages of "global first":
- Fewer code bases will be affected, but working out which ones is harder. The easiest migration will probably be to make sure all calls to namespaced functions are fully qualified, as though it was "global only".
- Even after the initial migration, users will have to watch out for new conflicting global functions. Again, this can be avoided by just pretending it's "global only".
- The engine won't be able to optimise calls where the name exists locally but not globally, because a userland global function could be defined at any time.
- Unlike with the current way around, there's unlikely to be a use case for shadowing a namespaced name with a global one; it will just be a gotcha that trips people up occasionally.
None of these seem like showstoppers to me, but since we can so easily go one step further to "global only", and avoid them, why wouldn't we?
Your answer to that seems to be that you think "global only" is a bigger BC break, but I wonder how much difference it really makes. As in, how many codebases are using unqualified calls to reference a namespaced function, but not shadowing a global name?
Regards,
Rowan Tommins
[IMSoP]
None of these seem like showstoppers to me, but since we can so easily go one step further to "global only", and avoid them, why wouldn't we?
FWIW I'd support global only, specifically because of the point I wouldn't necessarily want the change to hamstring our ability to add new functions in the future.
IMO, 1. is too drastic. As people have mentioned, there are tools to
automate disambiguation. But unless we gain some other benefit from
dropping the lookup entirely, why do it?I can think of a few disadvantages of "global first":
- Fewer code bases will be affected, but working out which ones is harder. The easiest migration will probably be to make sure all calls to namespaced functions are fully qualified, as though it was "global only".
- Even after the initial migration, users will have to watch out for new conflicting global functions. Again, this can be avoided by just pretending it's "global only".
- The engine won't be able to optimise calls where the name exists locally but not globally, because a userland global function could be defined at any time.
- Unlike with the current way around, there's unlikely to be a use case for shadowing a namespaced name with a global one; it will just be a gotcha that trips people up occasionally.
None of these seem like showstoppers to me, but since we can so easily go one step further to "global only", and avoid them, why wouldn't we?
Your answer to that seems to be that you think "global only" is a bigger BC break, but I wonder how much difference it really makes. As in, how many codebases are using unqualified calls to reference a namespaced function, but not shadowing a global name?
I can think of more than one one-off script where I have written something like this:
namespace blah;
function read_and_process_file(): array {
}
function do_something(array $file): void { }
$file = read_and_process_file();
var_dump($file);
// die(); // debug
do_something($file);
If it were global only, then how would I call those files? namespace\read_and_process_file()?
That seems worse ergonomics and not better, for very little gain.
Regards,
Rowan Tommins
[IMSoP]
— Rob
If it were global only, then how would I call those files? namespace\read_and_process_file()?
See my earlier posts, particularly https://externals.io/message/124718#125098 and https://externals.io/message/124718#125125
Rowan Tommins
[IMSoP]
On Fri, Aug 23, 2024 at 9:41 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:
IMO, 1. is too drastic. As people have mentioned, there are tools to
automate disambiguation. But unless we gain some other benefit from
dropping the lookup entirely, why do it?I can think of a few disadvantages of "global first":
- Fewer code bases will be affected, but working out which ones is harder. The easiest migration will probably be to make sure all calls to namespaced functions are fully qualified, as though it was "global only".
To talk about more concrete numbers, I now also analyzed how many
relative calls to local functions there are in the top 1000 composer
packages.
https://gist.github.com/iluuu1994/9d4bbbcd5f378d221851efa4e82b1f63
There were 4229 calls to local functions that were statically visible.
Of those, 1534 came from thecodingmachine/safe, which I'm excluding
again for a fair comparison. The remaining 2695 calls were split
across 210 files and 27 repositories, which is less than I expected.
The calls that need to be fixed by swapping the lookup order are a
subset of these calls, namely only the ones also clashing with some
global function. Hence, the process of identifying them doesn't seem
fundamentally different. Whether the above are "few enough" to justify
the BC break, I don't know.
- The engine won't be able to optimise calls where the name exists locally but not globally, because a userland global function could be defined at any time.
When relying on the lookup, the lookup will be slower. But if the
hypothesis is that there are few people relying on this in the first
place, it shouldn't be an issue. It's also worth noting that many of
the optimizations don't apply anyway, because the global function is
also unknown and hence a user function, with an unknown signature.
- Unlike with the current way around, there's unlikely to be a use case for shadowing a namespaced name with a global one; it will just be a gotcha that trips people up occasionally.
Indeed. But this is a downside of both these approaches.
None of these seem like showstoppers to me, but since we can so easily go one step further to "global only", and avoid them, why wouldn't we?
Your answer to that seems to be that you think "global only" is a bigger BC break, but I wonder how much difference it really makes. As in, how many codebases are using unqualified calls to reference a namespaced function, but not shadowing a global name?
I hope this provides some additional insight. Looking at the analysis,
I'm not completely opposed to your approach. There are some open
questions. For example, how do we handle functions declared and called
in the same file?
namespace Foo;
function bar() {}
bar();
Without a local fallback, it seems odd for this call to fail. An
option might be to auto-use Foo\bar when it is declared, although that
will require a separate pass over the top functions so that functions
don't become order-dependent.
Ilija
On Fri, Aug 23, 2024 at 9:41 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:IMO, 1. is too drastic. As people have mentioned, there are tools to
automate disambiguation. But unless we gain some other benefit from
dropping the lookup entirely, why do it?I can think of a few disadvantages of "global first":
- Fewer code bases will be affected, but working out which ones is harder. The easiest migration will probably be to make sure all calls to namespaced functions are fully qualified, as though it was "global only".
To talk about more concrete numbers, I now also analyzed how many
relative calls to local functions there are in the top 1000 composer
packages.https://gist.github.com/iluuu1994/9d4bbbcd5f378d221851efa4e82b1f63
There were 4229 calls to local functions that were statically visible.
Of those, 1534 came from thecodingmachine/safe, which I'm excluding
again for a fair comparison. The remaining 2695 calls were split
across 210 files and 27 repositories, which is less than I expected.The calls that need to be fixed by swapping the lookup order are a
subset of these calls, namely only the ones also clashing with some
global function. Hence, the process of identifying them doesn't seem
fundamentally different. Whether the above are "few enough" to justify
the BC break, I don't know.
- The engine won't be able to optimise calls where the name exists locally but not globally, because a userland global function could be defined at any time.
When relying on the lookup, the lookup will be slower. But if the
hypothesis is that there are few people relying on this in the first
place, it shouldn't be an issue. It's also worth noting that many of
the optimizations don't apply anyway, because the global function is
also unknown and hence a user function, with an unknown signature.
- Unlike with the current way around, there's unlikely to be a use case for shadowing a namespaced name with a global one; it will just be a gotcha that trips people up occasionally.
Indeed. But this is a downside of both these approaches.
None of these seem like showstoppers to me, but since we can so easily go one step further to "global only", and avoid them, why wouldn't we?
Your answer to that seems to be that you think "global only" is a bigger BC break, but I wonder how much difference it really makes. As in, how many codebases are using unqualified calls to reference a namespaced function, but not shadowing a global name?
I hope this provides some additional insight. Looking at the analysis,
I'm not completely opposed to your approach. There are some open
questions. For example, how do we handle functions declared and called
in the same file?namespace Foo;
function bar() {}
bar();Without a local fallback, it seems odd for this call to fail. An
option might be to auto-use Foo\bar when it is declared, although that
will require a separate pass over the top functions so that functions
don't become order-dependent.Ilija
Hey Ilija,
I'm actually coming around to global first, then local second. I haven't gotten statistically significant results yet though, but preliminary results show that global first gives symfony/laravel their speed boost and function autoloading gives things like wordpress their speed boost. Everyone wins.
For function autoloading, it is only called on the local check. So, it looks kinda like this:
- does it exist in global namespace?
- yes: load the function; done.
- no: continue
- does it exist in local namespace?
- yes: load the function; done.
- no: continue
- call the autoloader for local namespace.
- does it exist in local namespace?
- yes: load the function; done.
- no: continue
- does it exist in the global namespace?
- yes: load the function; done.
- no: continue
It checks the scopes in reverse order after autoloading because it is more likely that the autoloader loaded a local scope function than a global one. This adds a small inconsistency (if the autoloader were to load both a global and non-global function of the same name), but keeps autoloading fast for unqualified function calls. By checking global first, for OOP-centric codebases like Symfony and Laravel that call unqualified global functions, they never hit the autoloader. For things that do call qualified local-namespace functions, they hit the autoloader and immediately start loading them. The worst performance then becomes autoloading global functions that are called unqualified. Not only do you have to strip out the current namespace in the autoloader, but you have to deal with being the absolute last check in the function table. However, (and I'm still trying to figure out how to quantify this), I'm reasonably certain projects do not use global functions that often.
— Rob
On Fri, Aug 23, 2024 at 9:41 PM Rowan Tommins [IMSoP]
imsop.php@rwec.co.uk wrote:IMO, 1. is too drastic. As people have mentioned, there are tools to
automate disambiguation. But unless we gain some other benefit from
dropping the lookup entirely, why do it?I can think of a few disadvantages of "global first":
- Fewer code bases will be affected, but working out which ones is harder. The easiest migration will probably be to make sure all calls to namespaced functions are fully qualified, as though it was "global only".
To talk about more concrete numbers, I now also analyzed how many
relative calls to local functions there are in the top 1000 composer
packages.https://gist.github.com/iluuu1994/9d4bbbcd5f378d221851efa4e82b1f63
There were 4229 calls to local functions that were statically visible.
Of those, 1534 came from thecodingmachine/safe, which I'm excluding
again for a fair comparison. The remaining 2695 calls were split
across 210 files and 27 repositories, which is less than I expected.The calls that need to be fixed by swapping the lookup order are a
subset of these calls, namely only the ones also clashing with some
global function. Hence, the process of identifying them doesn't seem
fundamentally different. Whether the above are "few enough" to justify
the BC break, I don't know.
- The engine won't be able to optimise calls where the name exists locally but not globally, because a userland global function could be defined at any time.
When relying on the lookup, the lookup will be slower. But if the
hypothesis is that there are few people relying on this in the first
place, it shouldn't be an issue. It's also worth noting that many of
the optimizations don't apply anyway, because the global function is
also unknown and hence a user function, with an unknown signature.
- Unlike with the current way around, there's unlikely to be a use case for shadowing a namespaced name with a global one; it will just be a gotcha that trips people up occasionally.
Indeed. But this is a downside of both these approaches.
None of these seem like showstoppers to me, but since we can so easily go one step further to "global only", and avoid them, why wouldn't we?
Your answer to that seems to be that you think "global only" is a bigger BC break, but I wonder how much difference it really makes. As in, how many codebases are using unqualified calls to reference a namespaced function, but not shadowing a global name?
I hope this provides some additional insight. Looking at the analysis,
I'm not completely opposed to your approach. There are some open
questions. For example, how do we handle functions declared and called
in the same file?namespace Foo;
function bar() {}
bar();Without a local fallback, it seems odd for this call to fail. An
option might be to auto-use Foo\bar when it is declared, although that
will require a separate pass over the top functions so that functions
don't become order-dependent.Ilija
Hey Ilija,
I'm actually coming around to global first, then local second. I haven't gotten statistically significant results yet though, but preliminary results show that global first gives symfony/laravel their speed boost and function autoloading gives things like wordpress their speed boost. Everyone wins.
For function autoloading, it is only called on the local check. So, it looks kinda like this:
- does it exist in global namespace?
- yes: load the function; done.
- no: continue
- does it exist in local namespace?
- yes: load the function; done.
- no: continue
- call the autoloader for local namespace.
- does it exist in local namespace?
- yes: load the function; done.
- no: continue
- does it exist in the global namespace?
- yes: load the function; done.
- no: continue
It checks the scopes in reverse order after autoloading because it is more likely that the autoloader loaded a local scope function than a global one. This adds a small inconsistency (if the autoloader were to load both a global and non-global function of the same name), but keeps autoloading fast for unqualified function calls. By checking global first, for OOP-centric codebases like Symfony and Laravel that call unqualified global functions, they never hit the autoloader. For things that do call qualified local-namespace functions, they hit the autoloader and immediately start loading them. The worst performance then becomes autoloading global functions that are called unqualified. Not only do you have to strip out the current namespace in the autoloader, but you have to deal with being the absolute last check in the function table. However, (and I'm still trying to figure out how to quantify this), I'm reasonably certain projects do not use global functions that often.
— Rob
Amendment:
Actually, I may skip allowing the second check in the global space for autoloaders. In other words, if you want to autoload a global function, you need to call it fully qualified. It's not 100% ideal, but better than pinning and better performance for everyone.
— Rob
In other words, if you want to autoload a global function, you need to call it fully qualified.
When I said this thread reads like an April fools joke that wasn't a challenge you know.
Are you seriously suggesting that unqualified function lookups should be global first, then local, except if it's to be autoloader and then the global ones have to be fully qualified?
In other words, if you want to autoload a global function, you need to call it fully qualified.
When I said this thread reads like an April fools joke that wasn't a challenge you know.
Are you seriously suggesting that unqualified function lookups should be global first, then local, except if it's to be autoloader and then the global ones have to be fully qualified?
More like it only supports autoloading locally namespaced functions when unqualified. So, everything else works exactly the same.
Here's a table that might help (note, just typed it up off the top of my head so may have errors) for a global-first behavior:
| defined | qualified | type | from | autload | name | example |
|---------|-----------|--------|--------|---------|------------|---------------------|
| true | true | global | N/A | false | N/A | \strlen('hello') |
| false | true | global | N/A | true | \myfunc | \myfunc('hello') |
| true | false | global | global | false | N/A | strlen('hello') |
| true | false | global | ns | false | N/A | strlen('hello') |
| false | false | global | ns | true | ns\strlen | strlen('hello') |
| false | false | global | global | true | \strlen | \strlen('hello') |
| true | true | ns | N/A | false | N/A | \ns\myfunc('hello') |
| false | true | ns | N/A | true | ns\myfunc | \ns\myfunc('hello') |
| true | false | ns | ns | false | N/A | myfunc('hello') |
| false | false | ns | ns | true | ns\myfunc | myfunc('hello') |
With "local-first": if your autoloader receives a name "ns\strlen" then you should look for ns/strlen. An optimized autoloader will have a function map (similar to class map) that can quickly determine if that function exists in the project or not and where to load it from. For example, in my tests, I have a function map that breaks up the map into a specialized trie that appears to faster than an array for an arbitrary number of functions. In this case, it would know to drop it after about 1-2 steps into the prefix tree, return and let it look up the global.
With global-first, the autoloader never even gets called for something like strlen; instead it will be resolved in the global scope.
Now let’s look at the case if you want to have a written function called "myfunc()" in the global namespace. You want it to be autoloaded. Now, in some namespace ("ns"), the developer writes calls "myfunc()" unqualified, which is yet to be defined. The autoloader will be called (in both implementations) with the name "ns\myfunc" and it will be up to the autoloader implementation what to do about this. It can first walk the trie and decide there is nothing to do here, which is the most performant option. Alternatively, it can get the basename of "ns\myfunc" (which would be "myfunc") and walk the trie again. Say it does that and finds your function. Now when we return from the autoloader, we have to check the function table for both, again.
If we only allow autoloading from the current namespace for unqualified calls, we simplfiy autoloading implementations and speed up things for everyone. Someone can come along and amend this with an RFC in the future, but it would be much harder to go the other way around.
Further, you can always call your global function, like "\myfunc()" and it would "just work."
— Rob
Stephen
When I said this thread reads like an April fools joke that wasn't a challenge you know.
We just had somebody temporarily banned for ad-hominem attacks like
a week ago. Please familiarize yourself with the mailing list rules.
They apply to everyone.
https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md
Most significantly:
a. Make everybody happier, ...
and
- Do not post when you are angry. Any post can wait a few hours. Review your post after a good breather, or a good nights sleep.
Ilija
Stephen
When I said this thread reads like an April fools joke that wasn't a challenge you know.
We just had somebody temporarily banned for ad-hominem attacks like
a week ago. Please familiarize yourself with the mailing list rules.
They apply to everyone.https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md
Most significantly:
a. Make everybody happier, ...
and
- Do not post when you are angry. Any post can wait a few hours. Review your post after a good breather, or a good nights sleep.
Ilija
Hi Ilija,
I understand that emotion and expression don't convey well through text, but I will assure you that my response to Rob was in no part related to "anger".
I also understand that sarcasm doesn't come naturally to some, and it's not at all recognised by some languages/cultures, so my bad for not specifically pointing out that I was being sarcastic there. However I would be very surprised if any English speaker who understands the concept of sarcasm, doesn't realise/recognise "I didn't mean X as a challenge" is a very common sarcastic response.
Cheers
Stephen
Stephen
When I said this thread reads like an April fools joke that wasn't a challenge you know.
We just had somebody temporarily banned for ad-hominem attacks like
a week ago. Please familiarize yourself with the mailing list rules.
They apply to everyone.https://github.com/php/php-src/blob/master/docs/mailinglist-rules.md
Most significantly:
a. Make everybody happier, ...
and
- Do not post when you are angry. Any post can wait a few hours. Review your post after a good breather, or a good nights sleep.
Ilija
For what it’s worth. I specifically was not offended and took it as humour/sarcasm/exasperation.
I actually did lol.
— Rob
In a world where global functions take precedence over local ones
because some people don't like writing a single \ character,
autoloading would be a moot point because if you preference global
functions you're implicitly telling developers they shouldn't write
namespaced functions, by making them harder and less intuitive to use.
Sorry to reply to the same message twice, but as a concrete example, consider this code:
// Definition
namespace Acme\Foo;
class Utils {
public static function magic(string $x): int {
return \strlen($x);
}
public static function more_magic(string $x): int {
return self::magic($x) * 2;
}
}
// Caller
namespace Acme\MyApp\SearchPage;
use Acme\Foo\Utils;
echo Utils::more_magic($_GET['query']);
Rewritten as namespaced functions, with current PHP:
// Definition
namespace Acme\Foo\Utils;
function magic(string $x): int {
return strlen($x);
}
function more_magic(string $x): int {
return magic($x) * 2;
}
// Caller
namespace Acme\MyApp\SearchPage;
use Acme\Foo\Utils;
echo Utils\more_magic($_GET['query']);
With "unqualified names are global", but a new "_" shorthand for "relative to current", the caller is completely unaffected, but the definition becomes:
namespace Acme\Foo\Utils;
function magic(string $x): int {
return strlen($x);
}
function more_magic(string $x): int {
return _\magic($x) * 2;
}
Note how the "_" is used in all the same places as "self::" was in the "static class" version.
With "unqualified names are local", the change is very similar, but "the other way around":
namespace Acme\Foo\Utils;
function magic(string $x): int {
return \strlen($x);
}
function more_magic(string $x): int {
return magic($x) * 2;
}
Regards,
Rowan Tommins
[IMSoP]
It also
sparked some related ideas, like providing modules that lock
namespaces and optimize multiple files as a singular unit. That said,
such approaches would likely be significantly more complex than the
approach proposed here (~30 lines of C code).
There was an entire thread about modules and packages and shenanigans not too long ago. It’s rather fascinating. Highly recommend participating or starting a new thread. It seems that people are interested in it, and want it.
— Rob
Hi everyone
As you probably know, a common performance optimization in PHP is to
prefix global function calls in namespaced code with a\
. In
namespaced code, relative function calls (meaning, not prefixed with
\
, not imported and not containing multiple namespace components)
will be looked up in the current namespace before falling back to the
global namespace. Prefixing the function name with\
disambiguates
the called function by always picking the global function.Not knowing exactly which function is called at compile time has a
couple of downsides to this:
- It leads to the aforementioned double-lookup.
- It prevents compile-time-evaluation of pure internal functions.
- It prevents compiling to specialized opcodes for specialized
internal functions (e.g.strlen()
).- It requires branching for frameless functions [1].
- It prevents an optimization that looks up internal functions by
offset rather than by name [2].- It prevents compiling to more specialized argument sending opcodes
because of unknown by-value/by-reference passing.All of these are enabled by disambiguating the call. Unfortunately,
prefixing all calls with\
, or adding ause function
at the top
of
every file is annoying and noisy. We recently got a feature request
to
change how functions are looked up [3].
I think there should be some way to use globals first at compile time.
I had suggested a per-file directive in a post to this list a while
back. Something like:
namespace foo;
use global functions;
class MyClass {
// do stuff.
}
Where use global functions
would be a special token that the compiler
uses to skip the ns lookup and use dedicated opcodes when available.
I had suggested a per-file directive in a post to this list a while
back. Something like:namespace foo;
use global functions;
There was a proposal for exactly this a few years ago, which ended up in an RFC with a slightly different syntax (using a declare() statement), but was declined in voting by 35 votes to 2.
I can't remember much about the discussion, so am not sure what changes would make a new attempt more likely to pass.
Regards,
Rowan Tommins
[IMSoP]
On 2 August 2024 18:19:41 BST, Nick Lockheart lists@ageofdream.com
wrote:I had suggested a per-file directive in a post to this list a while
back. Something like:namespace foo;
use global functions;There was a proposal for exactly this a few years ago, which ended up
in an RFC with a slightly different syntax (using a declare()
statement), but was declined in voting by 35 votes to 2.I can't remember much about the discussion, so am not sure what
changes would make a new attempt more likely to pass.Regards,
Rowan Tommins
[IMSoP]
In all likelihood, it was the syntax that was disfavored.
What about an RFC where we vote on if the feature should exist, without
any syntax?
ie. "Should there be a way for developers to signal to the parser that
all functions should be treated as global and skip NS lookup, and use
dedicated opcodes? The specific syntax would be decided in a different
RFC/vote if this one passes."
Yes: We should do this, let's discuss syntax possibilities.
No: This should not be a feature at all.
There was a proposal for exactly this a few years ago, which ended up
in an RFC with a slightly different syntax (using a declare()
statement), but was declined in voting by 35 votes to 2.
Sorry, I forgot the link: https://wiki.php.net/rfc/use_global_elements
In all likelihood, it was the syntax that was disfavored.
There was a lot of discussion beforehand about the syntax, and the RFC attempts to summarise some of it, but skimming through the voting thread https://externals.io/message/108306 I don't think that was what made it fail. The objections seem to be mostly about the general approach, not the details.
Rowan Tommins
[IMSoP]
Hi,
Am 02.08.24 um 18:51 schrieb Ilija Tovilo:
...
There are a few noteworthy downsides:
- Unqualified calls to functions in the same namespace would be
slightly slower, because they now involve checking global scope first.
I believe that unqualified, global calls are much more common, so this
change should still result in a net positive. It's also possible to
avoid this cost by adding ause function
to the top of the file.- Introducing new functions in the global namespace could cause a BC
break for unqualified calls, if the function happens to have the same
name. This is unfortunate, but likely rare. Since new functions are
only introduced in minor/major versions, this should be manageable,
but must be considered for every PHP upgrade.- Some mocking libraries (e.g. Symfony's ClockMock [5]) intentionally
declare functions called from some file in the files namespace to
intercept these calls. This use-case would break. That said, it is
somewhat of a fragile approach to begin with, given that it wouldn't
work for fully qualified calls, or unnamespaced code.
Similar to Symfony's ClockMock this "feature" was propagated some years
ago to e.g. intercept calls to the file system when running tests where
the application was not designed with test-ability in mind.
Regards,
Thomas
Hi,
I propose the following alternative approach:
-
establish a restricted whitelist of global functions for which the performance gain would be noteworthy if there wasn’t any need to look at local scope first;
-
for those functions, disallow to define a function of same name in any namespace, e.g.: https://3v4l.org/RKnZt
That way, those functions could be optimised, but the current semantics of namespace lookup would remain unchanged.
—Claude
Hi Claude
I propose the following alternative approach:
establish a restricted whitelist of global functions for which the performance gain would be noteworthy if there wasn’t any need to look at local scope first;
for those functions, disallow to define a function of same name in any namespace, e.g.: https://3v4l.org/RKnZt
That way, those functions could be optimised, but the current semantics of namespace lookup would remain unchanged.
That would be an improvement over the status quo. However, if you look
at the bullet points in my original email, while some of the
optimizations apply only to some functions (CTE, custom opcodes and
frameless calls), others apply to all internal, global functions
(double-lookup, lookup by offset, specialized argument passing).
Hence, we may only get a fraction of the benefits by restricting the
optimization to a handful of functions. I also wonder if the impact is
actually bigger, as then there's no workaround for redeclaring the
function, requiring much bigger refactoring.
Ilija
Hi Claude
On Fri, Aug 2, 2024 at 9:02 PM Claude Pache claude.pache@gmail.com
wrote:I propose the following alternative approach:
establish a restricted whitelist of global functions for which
the performance gain would be noteworthy if there wasn’t any need
to look at local scope first;for those functions, disallow to define a function of same name
in any namespace, e.g.: https://3v4l.org/RKnZtThat way, those functions could be optimised, but the current
semantics of namespace lookup would remain unchanged.That would be an improvement over the status quo. However, if you
look
at the bullet points in my original email, while some of the
optimizations apply only to some functions (CTE, custom opcodes and
frameless calls), others apply to all internal, global functions
(double-lookup, lookup by offset, specialized argument passing).
Hence, we may only get a fraction of the benefits by restricting the
optimization to a handful of functions. I also wonder if the impact
is
actually bigger, as then there's no workaround for redeclaring the
function, requiring much bigger refactoring.Ilija
Also, overriding any default function is one of the benefits of name
spacing in the first place. If we say, "built-in functions can't be
overridden, then you are basically saying that all built-in functions
are global.
But there is a valid use case for overriding built-ins. You may want to
disable built-in functionality with a stub for unit testing.
There should probably be a per-file way of setting the default either
way without ns lookups.
use global functions
- or -
use local functions - or -
omit directive to use dynamic NS lookup for BC.
Hi Ilija,
I think this proposal has legs, and you are right to rekindle it,
instead of letting it die quietly.
- Some mocking libraries (e.g. Symfony's ClockMock [5]) intentionally
declare functions called from some file in the files namespace to
intercept these calls. This use-case would break. That said, it is
somewhat of a fragile approach to begin with, given that it wouldn't
work for fully qualified calls, or unnamespaced code.
My only concern is there needs to be an alternative way to do this:
intercepting internal calls. Sometimes, whether due to poor architecture
or otherwise, we just need to be able to replace an internal function
call. One example I can think of recently is where I had to replace
header()
with a void function in tests, just to stop some legacy code
emitting headers before the main framework kicked in, then unable to
emit its own response because HTTP headers had already been sent. In a
perfect world it shouldn't be necessary, but sometimes it is, so I think
for this proposal to be palpable there must still be a way to achieve this.
Cheers,
Bilge
Hi Ilija,
I think this proposal has legs, and you are right to rekindle it,
instead of letting it die quietly.
- Some mocking libraries (e.g. Symfony's ClockMock [5])
intentionally
declare functions called from some file in the files namespace to
intercept these calls. This use-case would break. That said, it is
somewhat of a fragile approach to begin with, given that it
wouldn't
work for fully qualified calls, or unnamespaced code.My only concern is there needs to be an alternative way to do this:
intercepting internal calls. Sometimes, whether due to poor
architecture or otherwise, we just need to be able to replace an
internal function call. One example I can think of recently is where
I had to replaceheader()
with a void function in tests, just to
stop some legacy code emitting headers before the main framework
kicked in, then unable to emit its own response because HTTP headers
had already been sent. In a perfect world it shouldn't be necessary,
but sometimes it is, so I think for this proposal to be palpable
there must still be a way to achieve this.
Cheers,
Bilge
I was thinking about a similar problem this week.
If class A relies on class B, but you want to swap out
class B with a stub to test class A in isolation,
is there a way to make every call to class B,
from class A, actually call a different class
during the test, without modifying class A's code?
Minimal code for discussion purposes:
// conf class in global namespace
abstract class CONF {
const DATABASE_HOST_NAME = 'db.example.com';
const DATABASE_NAME = 'production';
const DATABASE_USER_NAME = 'prod_user';
const DATABASE_PASSWORD = '123';
}
// conf class in test namespace:
namespace test;
abstract class CONF {
const DATABASE_HOST_NAME = 'db.sandbox.com';
const DATABASE_NAME = 'test';
const DATABASE_USER_NAME = 'test_user';
const DATABASE_PASSWORD = 'abc';
}
// SQL class in global namespace
class SQL {
private function Init(){
self::$oPDO = new PDO(
'mysql:host='.CONF::DATABASE_HOST_NAME.
';dbname='.CONF::DATABASE_NAME.';charset=utf8mb4',
CONF::DATABASE_USER_NAME,
CONF::DATABASE_PASSWORD,
[]
);
}
}
// Testing class in test namespace:
namespace test;
class SQLTester {
// How do I make the SQL class see \test\CONF instead of
// \CONF, when SQL calls for CONF in this test scope,
/// without changing anything inside of the SQL class?
}
I think some kind of sandboxing tools would be useful for
build/test/deployment.
I think some kind of sandboxing tools would be useful for
build/test/deployment.
There are uopz[1] and runkit7[2] available on PECL which can be used to
unit-test untestable code (and more), but you are likely better off to
refactor such code sooner than possible, since such extensions may
easily break for new minor PHP versions (and occasionally, such breaks
may not be fixable at all[3]), and often are completely broken for new
major PHP versions (uopz got a completely different API for PHP 7, and
runkit was even provided as new extension named runkit7). And
maintainig such extensions is a PITA[4], and as such, compatibility with
new PHP versions may not be available when you need it.
[1] https://pecl.php.net/package/uopz
[2] https://pecl.php.net/package/runkit7
[3] https://github.com/krakjoe/uopz/issues/176
[4] https://github.com/zenovich/runkit/issues/87
Cheers,
Christoph
I was thinking about a similar problem this week.
If class A relies on class B, but you want to swap out
class B with a stub to test class A in isolation,
is there a way to make every call to class B,
from class A, actually call a different class
during the test, without modifying class A's code?Minimal code for discussion purposes:
// conf class in global namespace
abstract class CONF {
const DATABASE_HOST_NAME = 'db.example.com';
const DATABASE_NAME = 'production';
const DATABASE_USER_NAME = 'prod_user';
const DATABASE_PASSWORD = '123';
}// conf class in test namespace:
namespace test;
abstract class CONF {
const DATABASE_HOST_NAME = 'db.sandbox.com';
const DATABASE_NAME = 'test';
const DATABASE_USER_NAME = 'test_user';
const DATABASE_PASSWORD = 'abc';
}// SQL class in global namespace
class SQL {private function Init(){
self::$oPDO = new PDO(
'mysql:host='.CONF::DATABASE_HOST_NAME.
';dbname='.CONF::DATABASE_NAME.';charset=utf8mb4',
CONF::DATABASE_USER_NAME,
CONF::DATABASE_PASSWORD,
[]
);
}
}// Testing class in test namespace:
namespace test;
class SQLTester {// How do I make the SQL class see \test\CONF instead of
// \CONF, when SQL calls for CONF in this test scope,
/// without changing anything inside of the SQL class?
}I think some kind of sandboxing tools would be useful for
build/test/deployment.
You could hack this out using the autoloader, but it's something that the
PHP community frowns upon, imo. A much prevalent practice in the PHP
ecosystem is a Dependency Injection container. A somewhat similar concept
exists in the Javascript ecosystem with hoisting import statements and
mocking modules, but if you don't understand the system and are unaware
that order of import execution will matter on whether the mock succeeds or
not plays a huge role in making it a cumbersome and awkward system.
Regardless, it's not possible for functions as users don't control function
autoloader.
Marco Deleu
Good morning,
I am writing to request RFC karma for the wiki account with username
nlockheart
.
I would like to write an RFC for community discussion and
consideration.
Thank you,
Nick Lockheart
I am writing to request RFC karma for the wiki account with username
nlockheart
.I would like to write an RFC for community discussion and
consideration.
RFC karma granted. Good luck with the RFC!
Christoph
If class A relies on class B, but you want to swap out
class B with a stub to test class A in isolation,
is there a way to make every call to class B,
from class A, actually call a different class
during the test, without modifying class A's code?
There are libraries that do exactly this, such as Mockery's "overload"
and "alias"; and some that do other transparent manipulations, like
https://github.com/dg/bypass-finals
They work by generating code dynamically, based on the real code, and
executing it before the real definition is loaded. The same approach
could definitely be taken to replace every call to a global function,
and would actually be more reliable than shadowing, because it could
rewrite even calls with a leading "" or "use function" statement.
Obviously, shadowing a function in one namespace is currently a lot
easier than setting up such a rewriter; but I don't think we should let
that convenience for a few use cases outweigh the benefits in
performance that a change in behaviour could bring, particularly when
combined with function autoloading.
--
Rowan Tommins
[IMSoP]
My only concern is there needs to be an alternative way to do this: intercepting internal calls. Sometimes, whether due to poor architecture or otherwise, we just need to be able to replace an internal function call. One example I can think of recently is where I had to replace
header()
with a void function in tests, just to stop some legacy code emitting headers before the main framework kicked in, then unable to emit its own response because HTTP headers had already been sent. In a perfect world it shouldn't be necessary, but sometimes it is, so I think for this proposal to be palpable there must still be a way to achieve this.
Just a tangent thought to the above, but I've always been a little concerned with the idea that a malicious composer package could potentially do nasty things because PHP looks at the local namespace first for functions. For example, if a composer package focused on Laravel that defines malicious versions of internal functions for common namespaces like App\Models , App\Http\Controllers , etc. it could do some nasty stuff -- and supply-chain attacks aren't exactly uncommon. Even worse is Wordpress or any other PHP-based software package that allows arbitrary plugins to be installed by non-technical users who really would have no idea if the package was safe even if they were looking at the code.
<?php
// something.php
namespace App\Models;
function password_hash(string $password, string|int|null $algo, array $options = []): string
{
print("Hello");
return $password;
}
<?php
// my code
namespace App\Models;
include "something.php";
password_hash('foobar', PASSWORD_DEFAULT);
I don't recall why local namespace first won, but IMO it wasn't a great call out the gate for that reason alone. Yes, you can always use \password_hash instead of password_hash , but making the default insecure and slower is silly IMO -- and not fixing it because of BC seems like the weaker argument here.
John
My only concern is there needs to be an alternative way to do this: intercepting internal calls. Sometimes, whether due to poor architecture or otherwise, we just need to be able to replace an internal function call. One example I can think of recently is where I had to replace
header()
with a void function in tests, just to stop some legacy code emitting headers before the main framework kicked in, then unable to emit its own response because HTTP headers had already been sent. In a perfect world it shouldn't be necessary, but sometimes it is, so I think for this proposal to be palpable there must still be a way to achieve this.Just a tangent thought to the above, but I've always been a little concerned with the idea that a malicious composer package could potentially do nasty things because PHP looks at the local namespace first for functions. For example, if a composer package focused on Laravel that defines malicious versions of internal functions for common namespaces like
App\Models
,App\Http\Controllers
, etc. it could do some nasty stuff -- and supply-chain attacks aren't exactly uncommon. Even worse is Wordpress or any other PHP-based software package that allows arbitrary plugins to be installed by non-technical users who really would have no idea if the package was safe even if they were looking at the code.<?php
// something.php
namespace App\Models;function password_hash(string $password, string|int|null $algo, array $options = []): string
{
print("Hello");
return $password;
}<?php
// my code
namespace App\Models;include "something.php";
password_hash('foobar', PASSWORD_DEFAULT);
If this is an attack vector for your application, then fully qualified names is the way to go (WordPress does this nearly everywhere, for example).
I don't recall why local namespace first won, but IMO it wasn't a great call out the gate for that reason alone. Yes, you can always use
\password_hash
instead ofpassword_hash
, but making the default insecure and slower is silly IMO -- and not fixing it because of BC seems like the weaker argument here.John
It's not (at least for me) the BC break. It's being able to override global functions. There are legitimate use-cases outside of testing. For example, consider when a global function signature changes. In your library, you have to check the php version. You can change this 100 times for every single call, or you can just wrap it in a function that supports the old signature and proxies it to the new signature. In other words, it provides options that may be better than the alternative.
— Rob
If this is an attack vector for your application, then fully qualified names is the way to go (WordPress does this nearly everywhere, for example).
This is an attack vector for every application and I would argue should be a real concern for the vast majority of applications out there -- any which rely on namespace-based frameworks and composer packages from untrustworthy sources. It's not just Wordpress -- literally every single PHP application that uses a publicly available framework and consumes external composer packages should be FQing their internal function calls. The natural behavior of the language shouldn't be the insecure way of doing things for the sake of maintaining BC compatibility with existing, insecure, code.
Cheers,
John
Hi John
This is an attack vector for every application and I would argue should be a real concern for the vast majority of applications out there -- any which rely on namespace-based frameworks and composer packages from untrustworthy sources. It's not just Wordpress -- literally every single PHP application that uses a publicly available framework and consumes external composer packages should be FQing their internal function calls. The natural behavior of the language shouldn't be the insecure way of doing things for the sake of maintaining BC compatibility with existing, insecure, code.
Including a malicious composer package already allows for arbitrary
code execution, do you really need more than that?
Ilija
Including a malicious composer package already allows for arbitrary
code execution, do you really need more than that?
Of course. We've seen many examples in the wild of 3rd party libraries getting hijacked to inject malicious code (e.g. the whole xz attack). This behavior in PHP is not obvious, and provides a way to covertly target and hijack specific highly sensitive functions without an obvious way to detect it -- while otherwise behaving exactly as a developer would expect.
Why possibly would we want to make it easier to perform such an attack, which as Illija pointed out is actually making PHP slower, in the name of backward compatibility? Defense in depth is a cornerstone of application security.
John
Forgive me, s/Illija/you :)
Including a malicious composer package already allows for arbitrary
code execution, do you really need more than that?Of course. We've seen many examples in the wild of 3rd party libraries getting hijacked to inject malicious code (e.g. the whole
xz
attack). This behavior in PHP is not obvious, and provides a way to covertly target and hijack specific highly sensitive functions without an obvious way to detect it -- while otherwise behaving exactly as a developer would expect.Why possibly would we want to make it easier to perform such an attack, which as Illija pointed out is actually making PHP slower, in the name of backward compatibility? Defense in depth is a cornerstone of application security.
John
If you have the ability to inject arbitrary code, you've already lost. It doesn't matter whether they use this feature, or just register a shutdown function, autoloader, replace classes/functions/methods entirely, or whatever. Should we remove those features as well?
— Rob
If you have the ability to inject arbitrary code, you've already lost. It doesn't matter whether they use this feature, or just register a shutdown function, autoloader, replace classes/functions/methods entirely, or whatever. Should we remove those features as well?
I think it's a fallacy to claim "well if they got this far the game is over" when it comes to application security. There are a million ways an attacker could use this feature to covertly gain access to things like passwords before they are encrypted, etc. that would enable lateral movement within an organization that otherwise they might have difficulty achieving even with RCE in a properly locked down system (e.g. PHP doesn't have the ability to write to the filesystem / overwrite existing classes, etc.)
Regarding the subject at hand I've made my case here and we can agree to disagree -- changing the function lookup order is an easy win with security benefits and, according to Ilija, performance benefits. I think it should be seriously considered.
John
As for providing a migration path: One approach might be to introduce
an INI setting that performs the function lookup in both local and
global scope at run-time, and informs the user about the behavioral
change in the future.
That INI setting would control the warning, and not the
functionlity, right?
Lastly, I've already raised this idea in the PHP Foundations internal
chat but did not receive much positive feedback, mostly due to fear of
the potential BC impact. I'm not particularly convinced this is an
issue, given the impact analysis. Given the surprisingly large
performance benefits, I was inclined to raise it here anyway.
I am surprised that it is that much of a performance benefit as well,
but I am also concerned about the BC impact. But if that isn't too much,
then I guess we need to consider this, but only for a major version. Not
something I believe we can change in a 8.x version.
cheers,
Derick
As for providing a migration path: One approach might be to introduce
an INI setting that performs the function lookup in both local and
global scope at run-time, and informs the user about the behavioral
change in the future.That INI setting would control the warning, and not the
functionlity, right?
Yes, that was my suggestion. First, in a future minor version, an INI
option could be added that would warn when finding both a local and
global function when performing an unqualified function call. The only
reason to hide this behind a setting is to avoid the cost of a double
lookup in production code when one isn't necessary, i.e. when calling
local functions in some namespace.
I am surprised that it is that much of a performance benefit as well,
but I am also concerned about the BC impact. But if that isn't too much,
then I guess we need to consider this, but only for a major version. Not
something I believe we can change in a 8.x version.
Sure, waiting for 9.0 sounds reasonable if we were to choose this approach.
Ilija