Covariance, again - Externals

6 years ago by Andreas Hennings — view source

unread

There were discussions about covariance and contravariance in the past.
https://externals.io/message/98085#98105
Unfortunately I was not subscribed back then, so I cannot respond to anything.
So, here it goes again.

WIth co- and contravariance, the following would be possible:

contravariance.php - https://3v4l.org/I3v0u
covariance.php - https://3v4l.org/i79O5

(from guilhermeblanco's older email in "PHP's support to
contravariance and covariance")

The main problem was expressed by Levi Morrison in this older thread.

Currently we do not autoload classes in type hints.
https://3v4l.org/sFsDd
In the example I can declare "UnknownClass" as a return type hint, and
PHP won't care.

However, to validate if a return type matches with the parent
definition, the class must be autoloaded first.
Or rather:

If the return type is identical with the parent, PHP can say "yes"
with no class loading required.
If the return type is different from the parent:
-- Currently, PHP simply says "no" (Fatal error: Declaration of
C::foo(): C must be compatible with I::foo()).
-- To support covariance, PHP would have to autoload the class in the
type hint, and then check the hierarchy.

Solutions proposed in old thread

Levi Morrison:

You need to adjust the passes over the code to register symbols and
their declared relationships, and then in a separate pass validate
them. After that if the class isn't found then you trigger an
autoload.

It's doable, it just hasn't been done.

Christoph M. Becker:

An alternative might be forward class declarations:

What I propose instead

I think it is not so complicated actually, and can be done without BC
break, and without forward type hints.
This only gives us covariance. Contravariance is another story.

When a class declaration is executed, do the following:

Parse the class AST (obviously).
Autoload all identifiers in "extends" and "implements".
Autoload all identifiers in return type hints that are not identical
with the parent return type hint.

If such a class is not found, report "Return type must either be
identical with the parent, or it must be an existing or autoloadable
class or interface."
Well, or any message that clarifies why this one was autoloaded, while
other type hint classes are not autoloaded.

So, this behavior is the same as today, except for the case of return
type hints that differ from the parent, which currently result in
fatal error.

Does this sound doable? Am I missing something else?

-- Andreas

6 years ago by Levi Morrison — view source

unread

There were discussions about covariance and contravariance in the past.
https://externals.io/message/98085#98105
Unfortunately I was not subscribed back then, so I cannot respond to anything.
So, here it goes again.

WIth co- and contravariance, the following would be possible:

contravariance.php - https://3v4l.org/I3v0u

covariance.php - https://3v4l.org/i79O5

(from guilhermeblanco's older email in "PHP's support to
contravariance and covariance")

The main problem was expressed by Levi Morrison in this older thread.

Currently we do not autoload classes in type hints.
https://3v4l.org/sFsDd
In the example I can declare "UnknownClass" as a return type hint, and
PHP won't care.

However, to validate if a return type matches with the parent
definition, the class must be autoloaded first.
Or rather:

If the return type is identical with the parent, PHP can say "yes"
with no class loading required.

If the return type is different from the parent:
-- Currently, PHP simply says "no" (Fatal error: Declaration of
C::foo(): C must be compatible with I::foo()).
-- To support covariance, PHP would have to autoload the class in the
type hint, and then check the hierarchy.

Solutions proposed in old thread

Levi Morrison:

You need to adjust the passes over the code to register symbols and
their declared relationships, and then in a separate pass validate
them. After that if the class isn't found then you trigger an
autoload.

It's doable, it just hasn't been done.

Christoph M. Becker:

An alternative might be forward class declarations:

What I propose instead

I think it is not so complicated actually, and can be done without BC
break, and without forward type hints.
This only gives us covariance. Contravariance is another story.

When a class declaration is executed, do the following:

Parse the class AST (obviously).

Autoload all identifiers in "extends" and "implements".

Autoload all identifiers in return type hints that are not identical
with the parent return type hint.

If such a class is not found, report "Return type must either be
identical with the parent, or it must be an existing or autoloadable
class or interface."
Well, or any message that clarifies why this one was autoloaded, while
other type hint classes are not autoloaded.

So, this behavior is the same as today, except for the case of return
type hints that differ from the parent, which currently result in
fatal error.

Does this sound doable? Am I missing something else?

I believe your algorithm fails on this simple setup:

<?php

interface A {
function foo(): X;
}

interface B extends A {
function foo(): Y;
}

interface X {
function bar(): A;
}

interface Y extends X {
function bar(): B;
}

If I correctly typed this from memory there is no way to order this
such that all units are defined ahead of time as needed for verifying
correctness. This means we trigger the autoloader even though the type
is defined in the same file. Even if we do some more complicated
compile-time passes we'd fail on things like this:

<?php // file1.php

interface A {
function foo(): X;
}

interface B extends A {
function foo(): Y;
}

if (getenv("ENABLE_X")) {
interface X {
function bar(): A;
}
}
?>
<?php // file2.php

interface Y extends X {
function bar(): B;
}

This case shows that care needs to be taken to get the order down
correctly even if we autoload:

<?php
interface A {
function foo(): X;
}

interface B extends A {
function foo(): Y;
}
?>
<?php // X.php
interface X {
function bar(): A;
}
?>
<?php // Y.php

interface Y extends X {
function bar(): C;
}
// At this point the engine will need to verify A and C but we may not
have finished verifying A and B yet
?>

All-in-all I don't think we can resolve every case cleanly because we
do not have purely ahead-of-time compilation for all units involved. I
think every method of implementing this feature has drawbacks and we
need to thoughtfully evaluate them.

6 years ago by Andreas Hennings — view source

unread

Let me address the simple example first.

I believe your algorithm fails on this simple setup:

<?php

interface A {
function foo(): X;
}

interface B extends A {
function foo(): Y;
}

interface X {
function bar(): A;
}

interface Y extends X {
function bar(): B;
}

?>

If I correctly typed this from memory there is no way to order this
such that all units are defined ahead of time as needed for verifying
correctness. This means we trigger the autoloader even though the type
is defined in the same file. Even if we do some more complicated
compile-time passes we'd fail on things like this:

You need to compile all classes in the file, and then do the autoloading.
So maybe my description of the algorithm is too simple.

However, this is not really new, and is not really a problem I would say.
What about this: https://3v4l.org/5klJQ

<?php
class C implements I {}

interface I {}
?>

Here the interface I is declared after the class C that implements the
interface.
This means the autoloading for identifiers in "extends" or
"implements" clauses already need to wait for the current file to be
fully processed.

So yes, we need to process the entire file before autoloading anything.
But we already do that for inheritance. So it is nothing new.

6 years ago by Andreas Hennings — view source

unread

Ok, I think I missed the circularity aspect in your examples.
Inheritance by itself is never circular.
However, return types can make this entire thing circular.

So the problem would be if we try to autoload the same thing that is
currently in the process of being being defined.

Maybe we could generate similar circularity problems with class_exists() calls?

Let me address the simple example first.

I believe your algorithm fails on this simple setup:

<?php

interface A {
function foo(): X;
}

interface B extends A {
function foo(): Y;
}

interface X {
function bar(): A;
}

interface Y extends X {
function bar(): B;
}

?>

If I correctly typed this from memory there is no way to order this
such that all units are defined ahead of time as needed for verifying
correctness. This means we trigger the autoloader even though the type
is defined in the same file. Even if we do some more complicated
compile-time passes we'd fail on things like this:

You need to compile all classes in the file, and then do the autoloading.
So maybe my description of the algorithm is too simple.

However, this is not really new, and is not really a problem I would say.
What about this: https://3v4l.org/5klJQ

<?php
class C implements I {}

interface I {}
?>

Here the interface I is declared after the class C that implements the
interface.
This means the autoloading for identifiers in "extends" or
"implements" clauses already need to wait for the current file to be
fully processed.

So yes, we need to process the entire file before autoloading anything.
But we already do that for inheritance. So it is nothing new.

6 years ago by Andreas Hennings — view source

unread

I believe your algorithm fails on this simple setup:

Another comment I want to make here:
The examples you give each have multiple class declarations per file.
I would personally not care much, if these result in fatal error.
All of this code used to be illegal until now (because no covariance
support), so it would not be a BC problem if some of it continues to
be illegal.

This being said: I think we can probably construct examples that have
one-class-per-file, but that still have a circularity problem due to
covariance.
Or possibly even with class_exists()?
I am going to play around a bit.

<?php

interface A {
function foo(): X;
}

interface B extends A {
function foo(): Y;
}

interface X {
function bar(): A;
}

interface Y extends X {
function bar(): B;
}

?>

If I correctly typed this from memory there is no way to order this
such that all units are defined ahead of time as needed for verifying
correctness. This means we trigger the autoloader even though the type
is defined in the same file. Even if we do some more complicated
compile-time passes we'd fail on things like this:

<?php // file1.php

interface A {
function foo(): X;
}

interface B extends A {
function foo(): Y;
}

if (getenv("ENABLE_X")) {
interface X {
function bar(): A;
}
}
?>
<?php // file2.php

interface Y extends X {
function bar(): B;
}

?>

This case shows that care needs to be taken to get the order down
correctly even if we autoload:

<?php
interface A {
function foo(): X;
}

interface B extends A {
function foo(): Y;
}
?>
<?php // X.php
interface X {
function bar(): A;
}
?>
<?php // Y.php

interface Y extends X {
function bar(): C;
}
// At this point the engine will need to verify A and C but we may not
have finished verifying A and B yet
?>

All-in-all I don't think we can resolve every case cleanly because we
do not have purely ahead-of-time compilation for all units involved. I
think every method of implementing this feature has drawbacks and we
need to thoughtfully evaluate them.

6 years ago by Levi Morrison — view source

unread

I believe your algorithm fails on this simple setup:

Another comment I want to make here:
The examples you give each have multiple class declarations per file.
I would personally not care much, if these result in fatal error.
All of this code used to be illegal until now (because no covariance
support), so it would not be a BC problem if some of it continues to
be illegal.

Just because it isn't a backwards compatibility break doesn't mean
it's a good way forward. Right now people have covariant returns -
they just don't express it in the signature because we don't allow it.
Wouldn't it seem odd that if the bodies of the methods stayed the same
and all they did was update the signature that it somehow breaks their
code?

I have some ideas about minimizing this impact but I really think we
ought to tackle covariant returns and contravariant parameters at the
same time. Any endeavor to add one should add the other to create a
cohesive design that works in both cases.

6 years ago by Andreas Hennings — view source

unread

I really think we ought to tackle covariant returns and contravariant parameters

My "algorithm" could be extended for contravariance:
Whenever a method has a parameter type hint that differs from the
parent type hint, autoload the class of the parent type hint.

I think I know too little about the internal workings of PHP to
understand why your examples would break.
I think we should give it a try, and write your examples as unit tests
to try to crash it.

If we indeed can produce circularity problems, then I might have more ideas.
I think the main idea in my algorithm about which classes needs to be
autoloaded and which don't is good.
Maybe at some point we need to write a class into the table of defined
classes, before it is fully verified.

At some point, for a different purpose, I thought about "stub"
classes, which have all the information from the declaration itself,
but not from any parent class. So we could write classes into the stub
table and then later write the completed thing into the actual class
table/list.
But maybe we don't need to go that far.

If I were to do this, it would be my first shot on the php engine
itself. Not going to happen today, but maybe I will find time for it
in a future month or so.
I am sure if/when I have done this, I can write more knowledgeable
posts here on this mailing list.
Of course if someone else wants to step up, go ahead.

I think the goal and spec is pretty clear. So once we have a
proof-of-concept implementation and show that it can be done and the
problems can be solved, it would be straightforward to make an RFC.
If we would do the RFC first, people would have to vote on something
which is unclear if it can be implemented.

I believe your algorithm fails on this simple setup:

Another comment I want to make here:
The examples you give each have multiple class declarations per file.
I would personally not care much, if these result in fatal error.
All of this code used to be illegal until now (because no covariance
support), so it would not be a BC problem if some of it continues to
be illegal.

Just because it isn't a backwards compatibility break doesn't mean
it's a good way forward. Right now people have covariant returns -
they just don't express it in the signature because we don't allow it.
Wouldn't it seem odd that if the bodies of the methods stayed the same
and all they did was update the signature that it somehow breaks their
code?

I have some ideas about minimizing this impact but I really think we
ought to tackle covariant returns and contravariant parameters at the
same time. Any endeavor to add one should add the other to create a
cohesive design that works in both cases.