[RFC] Namespace-scoped declares

8 years ago by Stanislav Malyshev — view source

unread

Hi!

The broader context of this proposal is to provide a simple and usable
mechanism that will allow developers to opt-in to stricter language
semantics on a per-library (or more specifically, per-namespace) basis,
thus alleviating backwards compatibility and library interoperability
concerns for such changes.

I don't think it's a good idea. Not only we'd have two language
semantics in one language - which is by itself very far from ideal - but
you'd have absolutely no way of knowing which semantics is active for
which file by just reading that file. You will have to consider all code
that could potentially run up to this point, and all code paths that
could have been taken, and could disable or enable strict context. It
could also mean that the same code could actually run with both models,
depending on the caller - which goes contrary to the whole point of
strict declaration. It's way worse than ini setting - at least ini
setting is supposed to be one for every install of the code and can't
change in runtime depending on code paths.

Moreover, this precludes any optimization decision from being made by
opcode cache and such - if the same file code can be run in both strict
and non-strict context, depending on what was executed before in the
same request, it is impossible to make any optimization decision on
per-file basis.

Moreover, this RFC clearly build an infrastructure for making more
semantic forks, eventually leading to the situation where reader of the
code has absolutely no idea, looking at the source of certain function,
what is actually the semantics of the language and the rules it will be
executed under. And neither, even worse, does the author of the code.

If that were localized by file, it'd be bad but one could grudgingly
tolerate it - you could scroll to the beginning of file and say "oh,
sigh, now we're in PHP with strict types, but lax objects, but strict
integers, but lax floats, but strict comparisons, but lax conditionals,
but strict argument counts! Now I understand what's going on if I only
keep in mind those 20 bits that are different in every file!". But after
delocalizing it, all hope is lost - you never know what the code in the
file actually means - because somebody could write code in completely
different file, maybe even JSON composer configuration or some other
config file you didn't even think to be able to change your language
semantics - and suddenly all the code works differently.

Or, for more fun, breaks differently. And you as code author have zero
control over it because of the wonders of shared mutable state which now
encompasses not only data but the very core of the language. Imagine how
fun it is if somebody's action in different code module wouldn't just
mess up some data - it would actually break your code by changing
language semantics for your code!

I don't think it is a good way to write a maintainable software.

Stas Malyshev
smalyshev@gmail.com

8 years ago by Stanislav Malyshev — view source

unread

Hi!

keep in mind those 20 bits that are different in every file!". But after
delocalizing it, all hope is lost - you never know what the code in the
file actually means - because somebody could write code in completely
different file, maybe even JSON composer configuration or some other
config file you didn't even think to be able to change your language
semantics - and suddenly all the code works differently.

Oh, I missed even worse feature - if you have namespace A\B\C, then you
get shared semantic context not only from this namespace, but also from
\A and \A\B. Which means basically that finding what semantic context
you end up with becomes several times harder - you need to locate not
only who may be controlling your namespace's context, but who is
controlling contexts of all the parent ones. And each time a new
semantic fork is introduced, you'd scan the whole tree and ensure every
node on the way agrees on which side of the fork they are, or explicitly
define it if you don't want to rely on all the parent chain always
staying on the same side you are.

Stas Malyshev
smalyshev@gmail.com

8 years ago by Nikita Popov — view source

unread

On Fri, Sep 23, 2016 at 9:45 PM, Stanislav Malyshev smalyshev@gmail.com
wrote:

Hi!

The broader context of this proposal is to provide a simple and usable
mechanism that will allow developers to opt-in to stricter language
semantics on a per-library (or more specifically, per-namespace) basis,
thus alleviating backwards compatibility and library interoperability
concerns for such changes.

I don't think it's a good idea. Not only we'd have two language
semantics in one language - which is by itself very far from ideal - but
you'd have absolutely no way of knowing which semantics is active for
which file by just reading that file. You will have to consider all code
that could potentially run up to this point, and all code paths that
could have been taken, and could disable or enable strict context. It
could also mean that the same code could actually run with both models,
depending on the caller - which goes contrary to the whole point of
strict declaration. It's way worse than ini setting - at least ini
setting is supposed to be one for every install of the code and can't
change in runtime depending on code paths.

Moreover, this precludes any optimization decision from being made by
opcode cache and such - if the same file code can be run in both strict
and non-strict context, depending on what was executed before in the
same request, it is impossible to make any optimization decision on
per-file basis.

No, this is not how it would work. While I did not go into the
technicalities of the implementation in this proposal, this issue is
briefly mentioned:

Namespace-scoped declares will have to be taken into account by opcache.
Namely, if a file is compiled with a certain set of namespace-scoped
declares, it cannot necessarily be reused if it is compiled with a
different set of declares. This could be solved by storing a checksum based
on the namespace-scoped declares at the time of compilation together with
the cached file and compare it when loading it. I believe this can be done
efficiently.

Compilation has to happen for a certain set of statically known declare
directives and the cached file will be fingerprinted to make sure it is not
reused if the declare directives are changed. In fact, it is not even
technically possible to treat certain declares (like ticks) at runtime,
because they require different codegen.

Moreover, this RFC clearly build an infrastructure for making more

semantic forks, eventually leading to the situation where reader of the
code has absolutely no idea, looking at the source of certain function,
what is actually the semantics of the language and the rules it will be
executed under. And neither, even worse, does the author of the code.

If that were localized by file, it'd be bad but one could grudgingly
tolerate it - you could scroll to the beginning of file and say "oh,
sigh, now we're in PHP with strict types, but lax objects, but strict
integers, but lax floats, but strict comparisons, but lax conditionals,
but strict argument counts! Now I understand what's going on if I only
keep in mind those 20 bits that are different in every file!". But after
delocalizing it, all hope is lost - you never know what the code in the
file actually means - because somebody could write code in completely
different file, maybe even JSON composer configuration or some other
config file you didn't even think to be able to change your language
semantics - and suddenly all the code works differently.

Or, for more fun, breaks differently. And you as code author have zero
control over it because of the wonders of shared mutable state which now
encompasses not only data but the very core of the language. Imagine how
fun it is if somebody's action in different code module wouldn't just
mess up some data - it would actually break your code by changing
language semantics for your code!

Err, okay.

Say I am a Symfony user. Say that before loading the library I include a
file with the following content:

namespace Symfony\Whatever\Namespace;
function strlen($str) {
return \strlen($str) + 1;
}
// Repeat for a few more namespaces.

This will end up hijacking uses of the strlen() function within the Symfony
codebase due to the way the global namespace fallback works.

OH MY GOD. If I can't even rely on the behavior of basic standard library
functions, what can I still rely on? A malicious user could completely
break my code! Nothing is certain anymore, all hope is lost! I should go
hide in the basement!

Of course, nobody is actually concerned about this. Yes, PHP is a
programming language, so you can break things pretty much however you like.
But it is common sense that you do not go about hijacking functions from
foreign namespaces and nobody is wasting time considering this possibility.

My analogy is probably a bit over the top, but I think this is really the
argument you're making. Yes, of course you can break things by setting
declares on foreign namespaces, but it wouldn't make any sense for anyone
to actually do this. Yes, you can cause confusion by choosing a different
set of declares for all the namespaces you use but ... why?

Realistically, if you maintain a library, you will have one global set of
declares you use for the entire library. I sure hope that it is not too
much to ask a library author to keep in mind the declares his project uses.

Actually, I would argue that it is much simpler to remember your global
library defaults, than to double-check whether the declares at the top of
the file you're currently editing are really the same as in the rest of
the project, or whether one option was maybe flipped. If you repeat all
your project defaults in every single file, you are bound to miss that one
case where dynamic_object_properties is set to 1 instead of 0, because that
particular file does require this functionality. If you have defaults that
are not repeated, this kind of explicit declare would stand out clearly.

Nikita

8 years ago by Dan Ackroyd — view source

unread

Imagine how
fun it is if somebody's action in different code module wouldn't just
mess up some data - it would actually break your code by changing
language semantics for your code!

If you are concerned about libraries modifying how code is run, the
same theoretical problem exists with libraries that register
autoloaders - 'omg they can totally change what code is even going to
be run, let alone the precise semantic behaviour'.

This turns out not to be a problem, as any library that did that would
lose all of it's users immediately.

For this RFC, it would be usual to only have the namespace_declare()
in the same place as the autoloader(s) are registered; in the
bootstrap file before any real code is run.

cheers
Dan

8 years ago by Marco Pivetta — view source

unread

On Fri, Sep 23, 2016 at 11:58 PM, Dan Ackroyd danack@basereality.com
wrote:

On 23 September 2016 at 20:45, Stanislav Malyshev smalyshev@gmail.com
wrote:

Imagine how
fun it is if somebody's action in different code module wouldn't just
mess up some data - it would actually break your code by changing
language semantics for your code!

If you are concerned about libraries modifying how code is run, the
same theoretical problem exists with libraries that register
autoloaders - 'omg they can totally change what code is even going to
be run, let alone the precise semantic behaviour'.

And I'd also add: some libraries are doing amazing AOP things with this.

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/

8 years ago by Stanislav Malyshev — view source

unread

Hi!

If you are concerned about libraries modifying how code is run, the
same theoretical problem exists with libraries that register
autoloaders - 'omg they can totally change what code is even going to
be run, let alone the precise semantic behaviour'.

Autoloaders do not change language semantics. They just change one
particular aspect, for which there is specific plugin interface. This is
a proposal to have plugin interface for changing everything in the
language, basically - which may sound cool initially, yay, I can make
PHP do whatever I like - until you consider how these changes may
coexist in a larger project. Loading a class is a pretty
compartmentalized thing, but changing language semantics is not. Where
it is - like meanings of operators - we cautiously allow some plugins.
But I think this one goes too far.

The problem isn't even in bootstrap order and such - while all those are
problems, they can be solved. The problem is you no longer know what
each piece of code means and how it works.

With classes, since you don't have all the classes in the same place,
it's usually solved by having phpdoc - you're supposed to know which
class does what by reading docs. With semantic changes, I don't know how
you're supposed to know which one of 2**N combinations is currently
active on this code.

--
Stas Malyshev
smalyshev@gmail.com

8 years ago by Andrea Faulds — view source

unread

Hi Stas,

I agree with you on all of this.

Stanislav Malyshev wrote:

Hi!

The broader context of this proposal is to provide a simple and usable
mechanism that will allow developers to opt-in to stricter language
semantics on a per-library (or more specifically, per-namespace) basis,
thus alleviating backwards compatibility and library interoperability
concerns for such changes.

I don't think it's a good idea. Not only we'd have two language
semantics in one language - which is by itself very far from ideal - but
you'd have absolutely no way of knowing which semantics is active for
which file by just reading that file. You will have to consider all code
that could potentially run up to this point, and all code paths that
could have been taken, and could disable or enable strict context. It
could also mean that the same code could actually run with both models,
depending on the caller - which goes contrary to the whole point of
strict declaration. It's way worse than ini setting - at least ini
setting is supposed to be one for every install of the code and can't
change in runtime depending on code paths.

The strict_types declare was very deliberately made local to individual
files. This means you can always know what mode you're working with by
looking at the top of the file, you don't have to check anything
external, and there's no potential ambiguity.

I don't think the addition of one extra statement at the top of each
file is really so burdensome (especially given it can be automated if
desired) as to justify getting rid of the current clarity explicit
declare()s give us.

Moreover, this RFC clearly build an infrastructure for making more
semantic forks, eventually leading to the situation where reader of the
code has absolutely no idea, looking at the source of certain function,
what is actually the semantics of the language and the rules it will be
executed under. And neither, even worse, does the author of the code.

If that were localized by file, it'd be bad but one could grudgingly
tolerate it - you could scroll to the beginning of file and say "oh,
sigh, now we're in PHP with strict types, but lax objects, but strict
integers, but lax floats, but strict comparisons, but lax conditionals,
but strict argument counts! Now I understand what's going on if I only
keep in mind those 20 bits that are different in every file!". But after
delocalizing it, all hope is lost - you never know what the code in the
file actually means - because somebody could write code in completely
different file, maybe even JSON composer configuration or some other
config file you didn't even think to be able to change your language
semantics - and suddenly all the code works differently.

Or, for more fun, breaks differently. And you as code author have zero
control over it because of the wonders of shared mutable state which now
encompasses not only data but the very core of the language. Imagine how
fun it is if somebody's action in different code module wouldn't just
mess up some data - it would actually break your code by changing
language semantics for your code!

I don't think it is a good way to write a maintainable software.

I similarly feel that a proliferation of semantic forks is not ideal.
From my perspective, strict_types is an exceptional case where I
thought it was worth the slight fragmentation, there wasn't a better way
to achieve the same outcome, and it was important enough to take that
risk. Generally, though, I think it is best to avoid making language
changes optional.

Furthermore, I agree with you that such forks become more confusing if
you can't tell which ones are in use from the file you're in. Haskell,
which has a large number of optional language features, notably
recommends that you turn them on using a pragma in source files (akin to
PHP's declare()) rather than compiler flags.

It is interesting that one of the reasons suggested for implementing
namespace-scoped declares is to enable more of these types of semantic
forks. If having to add extra lines at the top of source files is a
deterrent to adding more of them, I think that may actually be a feature
of our current implementation, not a bug.

Thnaks.

--
Andrea Faulds
https://ajf.me/

8 years ago by Dan Ackroyd — view source

unread

From the RFC:

it might be beneficial to add a supports_declare() function, which allows you to determine whether a certain declare directive is supported, without resorting to a PHP version check

Even if we don't have any directives that need to be checked yet, I
think that would be a good thing to have.

It will allow people who wish to experiment with new features to do so
more easily, which will make it easier to evolve the language, without
core PHP needing to be aware of all the directives.

cheers
Dan

8 years ago by Michael Morris — view source

unread

Hi internals!

I'd like to propose the ability of specifying declare directives like
"strict_types" at the namespace level.

The broader context of this proposal is to provide a simple and usable
mechanism that will allow developers to opt-in to stricter language
semantics on a per-library (or more specifically, per-namespace) basis,
thus alleviating backwards compatibility and library interoperability
concerns for such changes.
https://wiki.php.net/rfc/namespace_scoped_declares
I don't know whether this is the right solution to this problem, so next to
the specifics of this proposal, I'm also open to discussing alternative
approaches, as long such discussions can be held at a reasonably technical
level (i.e. concrete suggestions rather than vague concepts).

Thanks,
Nikita

Aside from the issues already raised, implementing anything like this would
require reworking how the engine handles namespaces. My understanding is
namespaces are implemented as a set of string replace rules. The namespace
itself is not a structure to php. This prevents things like class privacy,
Java's notion of "protected" and so on.

8 years ago by Rowan Collins — view source

unread

Hi internals!

I'd like to propose the ability of specifying declare directives like
"strict_types" at the namespace level.

The broader context of this proposal is to provide a simple and usable
mechanism that will allow developers to opt-in to stricter language
semantics on a per-library (or more specifically, per-namespace) basis,
thus alleviating backwards compatibility and library interoperability
concerns for such changes.

I'm with others on not liking the idea of adding a new set of "at a
distance" settings into the language. If distributing a package of PHP
code was like distributing a compiled library, then it might make more
sense, but even then we could all end up learning multiple variants of
the same language.

It's interesting that you pick as your example preventing dynamic object
properties. That's definitely a useful option, but it turns out it can
already be done, on a class by class basis. I've put together a quick
library here: https://github.com/IMSoP/php-strict It lets you write this:

class SomeModel {
use \IMSoP\Strict\Properties;
public $someProperty;
}

To me, that seems a lot clearer than this:

namespace Some\Project;
class SomeModel {
public $someProperty;
}

Where "Some\Project" happens to have had a compiler option activated
that injects invisible code to make the class behave more strictly.

If we did want to build such a switch into the language, declaring it
inside the class in a similar way to using a trait would seem pretty
sensible to me.

namespace Symfony\Whatever\Namespace;
function strlen($str) {
return \strlen($str) + 1;
}
// Repeat for a few more namespaces.

This will end up hijacking uses of the strlen() function within the Symfony
codebase due to the way the global namespace fallback works.

Well, as the recent autoloading discussion showed, that global namespace
fallback is not without its problems. And it looks like 7.0 has finally
removed the ability to redefine "true" and "false" as namespace
constants, although I'm not sure if this was just a side-effect of the
phpng work https://3v4l.org/S2AFZ

Regards,

--
Rowan Collins
[IMSoP]