Non-nullable properties

7 years ago by Rowan Collins — view source

unread

Hi all,

As briefly mentioned, I think the approach to non-nullable types in the typed properties proposal needs more discussion, to make sure we're not repeating Tony Hoare's "billion-dollar mistake".

For parameter hints, nullability is naturally optional, particularly in PHP where "nullable Foo" is equivalent to a union type "null or Foo". For properties, this isn't the case, because every property must have some initial state.

The simplest solution, which I think would be a sensible starting point if we're not ready for more complex changes, is to say that all typed properties must have a default value, and if that default value is null, the type hint must naturally be nullable. In other words, make "public Foo $foo" illegal, but allow "public ?Foo $foo=null".

The current RFC proposes the next simplest solution, which is to allow non-nullable types, and trust the user to initialise them before use.

My first objection to this is that it creates an entirely new state for object properties to be in, which behaves differently from everything else in the language. There will then be three or four different ways for an object property to be "unset":

not declared at all; access gives a notice and the implicit value null
declared with no default, uninitialised, implicitly null
declared with a default of null, or explicitly assigned as null
declared with a non-nullable type, never initialised; rather than an implicit null, access generates an Error

The bigger problem is that the properties are non-nullable in name only, and the declaration can't be trusted. Consider the following class:

class Validity {
public \DateTimeInterface $validFrom;
public ?\DateTimeInterface $validTo = null;
}

If I have an instance $v of that class, I would expect to be able to safely call $v->validFrom->getTimestamp(), but need to check for nulls before doing the same with validTo. Under the current proposal, doing so will risk errors, so I will have to check both properties before access anyway. Worse, using is_null() or ?: will attempt to retrieve the value, so I actually have to be more careful, and use isset() or ?? instead.

Alternatively, I can trust the author of the class, but at that point they might as well just use a docblock - the type hint documents their intent, but is not enforced.

Swift is often cited as a language which gets nullability right, and a lot of attention is given to Options, and the compiler ensuring that the None / null case is handled; but at least as important is how it handles initialisation of non-nullable properties, using "two-phase initialisation": https://docs.swift.org/swift-book/LanguageGuide/Initialization.html

In short, Swift defines a specific point after which all properties must have a valid value; before this point, the entire object is invalid for read operations, so there is no need for a specific error when accessing uninitialised properties.

Swift marks this point by the call to the parent initialiser, ultimately going up the chain to the root object. PHP has no root object, and no mandatory parent constructor calls, so would need a different way to mark this stage. Constructors can already violate the rules the first phase should impose, so we can't just use the end of the constructor as the validation point.

One possibility would be to add a new keyword like "initialize" which must be added to constructors in the presence of non-nullable properties. Above that keyword, use of $this other than to write to its properties would be an error; afterwards, the constructor could carry on as normal.

As I say, this is complex, and it may be best to add nullable typed properties first, and give more time to try out approaches to initialisation.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Levi Morrison — view source

unread

Hi all,

As briefly mentioned, I think the approach to non-nullable types in the typed properties proposal needs more discussion, to make sure we're not repeating Tony Hoare's "billion-dollar mistake".

For parameter hints, nullability is naturally optional, particularly in PHP where "nullable Foo" is equivalent to a union type "null or Foo". For properties, this isn't the case, because every property must have some initial state.

The simplest solution, which I think would be a sensible starting point if we're not ready for more complex changes, is to say that all typed properties must have a default value, and if that default value is null, the type hint must naturally be nullable. In other words, make "public Foo $foo" illegal, but allow "public ?Foo $foo=null".

The current RFC proposes the next simplest solution, which is to allow non-nullable types, and trust the user to initialise them before use.

My first objection to this is that it creates an entirely new state for object properties to be in, which behaves differently from everything else in the language. There will then be three or four different ways for an object property to be "unset":

not declared at all; access gives a notice and the implicit value null

declared with no default, uninitialised, implicitly null

declared with a default of null, or explicitly assigned as null

declared with a non-nullable type, never initialised; rather than an implicit null, access generates an Error

The bigger problem is that the properties are non-nullable in name only, and the declaration can't be trusted. Consider the following class:

class Validity {
public \DateTimeInterface $validFrom;
public ?\DateTimeInterface $validTo = null;
}

Even if this had a constructor we couldn't "trust" it because there
are ways to skip constructors.

If I have an instance $v of that class, I would expect to be able to safely call $v->validFrom->getTimestamp(), but need to check for nulls before doing the same with validTo. Under the current proposal, doing so will risk errors, so I will have to check both properties before access anyway. Worse, using is_null() or ?: will attempt to retrieve the value, so I actually have to be more careful, and use isset() or ?? instead.

Alternatively, I can trust the author of the class, but at that point they might as well just use a docblock - the type hint documents their intent, but is not enforced.

Swift is often cited as a language which gets nullability right, and a lot of attention is given to Options, and the compiler ensuring that the None / null case is handled; but at least as important is how it handles initialisation of non-nullable properties, using "two-phase initialisation": https://docs.swift.org/swift-book/LanguageGuide/Initialization.html

In short, Swift defines a specific point after which all properties must have a valid value; before this point, the entire object is invalid for read operations, so there is no need for a specific error when accessing uninitialised properties.

Swift marks this point by the call to the parent initialiser, ultimately going up the chain to the root object. PHP has no root object, and no mandatory parent constructor calls, so would need a different way to mark this stage. Constructors can already violate the rules the first phase should impose, so we can't just use the end of the constructor as the validation point.

One possibility would be to add a new keyword like "initialize" which must be added to constructors in the presence of non-nullable properties. Above that keyword, use of $this other than to write to its properties would be an error; afterwards, the constructor could carry on as normal.

As I say, this is complex, and it may be best to add nullable typed properties first, and give more time to try out approaches to initialisation.

Again, as constructors can be skipped this solution does little to
ensure validity. The approach from the RFC is effectively required
because of our existing feature set.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Rowan Collins — view source

unread

class Validity {
public \DateTimeInterface $validFrom;
public ?\DateTimeInterface $validTo = null;
}

Even if this had a constructor we couldn't "trust" it because there
are ways to skip constructors.

Then the mechanisms should either be impossible with non-nullable properties, or have a way to mark the object initialised. If not, the object created would not be fully initialised, so accessing the entire object should be an error, not just accessing the specific properties which are marked non-nullable.

I'm not claiming any of this is easy, but if you can't guarantee a property has a valid value, what is the type hint actually for?

Regards,

--
Rowan Collins
[IMSoP]

7 years ago by Levi Morrison — view source

unread

class Validity {
public \DateTimeInterface $validFrom;
public ?\DateTimeInterface $validTo = null;
}

Even if this had a constructor we couldn't "trust" it because there
are ways to skip constructors.

Then the mechanisms should either be impossible with non-nullable properties, or have a way to mark the object initialised. If not, the object created would not be fully initialised, so accessing the entire object should be an error, not just accessing the specific properties which are marked non-nullable.

I'm not claiming any of this is easy, but if you can't guarantee a property has a valid value, what is the type hint actually for?

Whether it's a constructor, a factory method, or just a block of code
near the new site: something has the responsibility for
initialization. Once initialized the object provides guarantees unless
you unset the property. It also provides the guarantee that if it is
set then it will always be of the outlined type; this is a much
stronger guarantee than we have today, which is... nothing.

It's worth discussing, and if someone has a brilliant idea maybe we
can proceed. However, in the absence of such an idea am I fine with
the outlined semantics of the RFC.

7 years ago by Rowan Collins — view source

unread

Whether it's a constructor, a factory method, or just a block of code
near the new site: something has the responsibility for
initialization.

Indeed, and my suggestion is to formalise that responsibility, so that mistakes can be detected closer to their source - which is, after all, the whole point of type constraints.

Once initialized the object provides guarantees unless
you unset the property.

But since you have no way of knowing if it was initialized or not, this isn't a guarantee you can actually base code on.

It also provides the guarantee that if it is
set then it will always be of the outlined type; this is a much
stronger guarantee than we have today, which is... nothing.

I'm not saying we should block the whole feature (apologies if that wasn't clear), so that's not the appropriate comparison. The comparison I'm interested in is between writing...

/**

Will be initialised in the constructor and should not be unset.
*/
public Foo $foo;

... and ...

/**

Will be initialised in the constructor and should not be set to null.
*/
public ?Foo $foo=null;

In both cases, the intent is documented, but not enforced.

If we allow the first version, with no actual checks on the initialisation, it will be very hard to introduce a stricter version later. If we only allow the second version in the initial implementation, we lose very little.

The rule could be very simple: typed properties must specify a valid default value. That means object hints will have to be nullable, with a default of null, and there is no need to invent a new kind of "null reference".

Regards,

--
Rowan Collins
[IMSoP]

7 years ago by Dan Ackroyd — view source

unread

Hi all,

The current RFC proposes the next simplest solution, which is to allow non-nullable types, and trust the user to initialise them before use.

From the RFC:

If a typed property does not have a default value, no implicit null default value is implied (even if the property is nullable).
Instead, the property is considered to be uninitialized. Reads from uninitialized properties will generate a TypeError

Are you miss-quoting the RFC?

Because those two sentences are completely different.

cheers
Dan

7 years ago by Rowan Collins — view source

unread

On 14 July 2018 at 14:09, Rowan Collins rowan.collins@gmail.com
wrote:

Hi all,

The current RFC proposes the next simplest solution, which is to
allow non-nullable types, and trust the user to initialise them before
use.

From the RFC:

If a typed property does not have a default value, no implicit null
default value is implied (even if the property is nullable).
Instead, the property is considered to be uninitialized. Reads from
uninitialized properties will generate a TypeError

Are you miss-quoting the RFC?

Because those two sentences are completely different.

There's no contradiction here; throwing an error when a property is read is not the same as enforcing that it always has a valid value.

Compare with parameter type hints:

function thing(Foo $foo) {
$foo->bar();
}
thing(null);

The line that throws the error is thing(null), and that's the line that needs fixing. We don't allow the parameter in, and then throw an error when trying to use it, leaving the user to work out how it got there; that would defeat the purpose of the type hint.

The difference between "uninitialised" and "null" is largely irrelevant: either way, the only way to run code like $foo->bar->baz() safely is to test the property before use. Or, to simply trust the user not to make a mistake.

Regards,

--
Rowan Collins
[IMSoP]

7 years ago by Dan Ackroyd — view source

unread

There's no contradiction here; throwing an error when a
property is read is not the same as enforcing that it always has a valid value.

That's true, but claiming the RFC just 'trusts' the users to
initialise them is a miss-representation of the RFC to me. The RFC
catches the mistake if the programmer fails to initialise the variable
then reads from it.

If not, the object created would not be fully initialised,

This is always going to be the case, unless you're proposing to
deprecate ReflectionClass::newInstanceWithoutConstructor .... which is
going to break a huge number of applications.

tbh, I really don't see a real problem that needs to be fixed. For me
it is the equivalent of this:

function foo() : int {
return "five";
}

This gives an error when the code is run, not when the code is
compiled, despite it being a 'detectable' error. In most compiled
languages, this error would be detected at the compilation step. In
'dynamic' languages it is detected at run-time.

The same is true for classes with typed properties. Yes, if you make a
mistake and forget to set a typed property before using an object
there will be an error.

That would only be a problem in production if you don't run that code
at all in development or in test....which is true of many, many
things.

cheers
Dan

7 years ago by Rowan Collins — view source

unread

On 16 July 2018 at 09:43, Rowan Collins rowan.collins@gmail.com
wrote:

There's no contradiction here; throwing an error when a
property is read is not the same as enforcing that it always has a
valid value.

That's true, but claiming the RFC just 'trusts' the users to
initialise them is a miss-representation of the RFC to me. The RFC
catches the mistake if the programmer fails to initialise the variable
then reads from it.

Perhaps a better wording is that it requires one user to trust another user: the mistake that leads to an unitialized property is the responsibility of the author of the class, but the error only appears when the object is used, which could be in a completely different library.

If not, the object created would not be fully initialised,

This is always going to be the case, unless you're proposing to
deprecate ReflectionClass::newInstanceWithoutConstructor .... which is
going to break a huge number of applications.

It only needs to be banned (or modified) for classes which claim their properties are non-nullable. Or, don't introduce non-nullable properties, because they can't actually be enforced.

For me
it is the equivalent of this:

function foo() : int {
return "five";
}

This gives an error when the code is run, not when the code is
compiled

It's not about run-time versus compile-time, it's about when the error is raised. In this case, it's raised when the return statement is executed. The equivalent to the proposed behaviour would be this:

function foo() : int {
return "five";
}
$foo = foo(); // runs OK, but marks the variable as invalid
// ... Many lines of code later
// $foo has not been touched so should be an integer
$bar = $foo + 1; // Error! Sorry, somebody messed up a return value somewhere, it's now up to you to figure out where

That would only be a problem in production if you don't run that code
at all in development or in test....

If you are going to write tests to see if all your variables are being initialized correctly, then you don't need non-nullable type hints, you can just assertNotNull on each property.

But again, the user writing those tests is not the user relying on them. You are asking users of a library to trust that the library has appropriate tests.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Marco Pivetta — view source

unread

These don't really need explicit tests in most cases, but rather static
analysis (currently happening via docblocks). Static analysis tools like
vimeo/psalm already pick this up.

I'd even be happy to get type hints that only have effect on
ReflectionProperty#getType() as a massive improvement over the current
situation we've monkey-patched us into, but the patch is indeed consistent
with PHP's current state.

On 16 July 2018 at 09:43, Rowan Collins rowan.collins@gmail.com
wrote:

There's no contradiction here; throwing an error when a
property is read is not the same as enforcing that it always has a
valid value.

That's true, but claiming the RFC just 'trusts' the users to
initialise them is a miss-representation of the RFC to me. The RFC
catches the mistake if the programmer fails to initialise the variable
then reads from it.

Perhaps a better wording is that it requires one user to trust another
user: the mistake that leads to an unitialized property is the
responsibility of the author of the class, but the error only appears when
the object is used, which could be in a completely different library.

If not, the object created would not be fully initialised,

This is always going to be the case, unless you're proposing to
deprecate ReflectionClass::newInstanceWithoutConstructor .... which is
going to break a huge number of applications.

It only needs to be banned (or modified) for classes which claim their
properties are non-nullable. Or, don't introduce non-nullable properties,
because they can't actually be enforced.

For me
it is the equivalent of this:

function foo() : int {
return "five";
}

This gives an error when the code is run, not when the code is
compiled

It's not about run-time versus compile-time, it's about when the error is
raised. In this case, it's raised when the return statement is executed.
The equivalent to the proposed behaviour would be this:

function foo() : int {
return "five";
}
$foo = foo(); // runs OK, but marks the variable as invalid
// ... Many lines of code later
// $foo has not been touched so should be an integer
$bar = $foo + 1; // Error! Sorry, somebody messed up a return value
somewhere, it's now up to you to figure out where

That would only be a problem in production if you don't run that code
at all in development or in test....

If you are going to write tests to see if all your variables are being
initialized correctly, then you don't need non-nullable type hints, you can
just assertNotNull on each property.

But again, the user writing those tests is not the user relying on them.
You are asking users of a library to trust that the library has appropriate
tests.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Rowan Collins — view source

unread

These don't really need explicit tests in most cases, but rather static
analysis (currently happening via docblocks). Static analysis tools like
vimeo/psalm already pick this up.

Then why do we need the type hints at all?

I'd even be happy to get type hints that only have effect on
ReflectionProperty#getType() as a massive improvement over the current
situation we've monkey-patched us into, but the patch is indeed consistent
with PHP's current state.

I disagree that it's consistent. It introduces an entirely new kind of null
reference ("declared but uninitialised"), which will require more careful
checks than forcing the value to be initially null, and will undoubtedly
further annoy those who already claim that isset() is broken (the RFC
doesn't even mention it, but I presume uninitialised properties will return
false from isset(), but throw an error from is_null()).

If all typed properties require a valid default value, the patch is
simpler, errors are raised at a line where you can fix them, and the
language remains consistent - if you're promised a Foo, you get a Foo, not
a "whoops, why isn't there a Foo here?" error.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Marco Pivetta — view source

unread

There are naturally 3 states in the engine:

1 - value set
2 - value not set (default null)
2 - undefined/uninitialised

These have been around since 5.0 AFAIK.

These don't really need explicit tests in most cases, but rather static
analysis (currently happening via docblocks). Static analysis tools like
vimeo/psalm already pick this up.

Then why do we need the type hints at all?

I'd even be happy to get type hints that only have effect on
ReflectionProperty#getType() as a massive improvement over the current
situation we've monkey-patched us into, but the patch is indeed
consistent
with PHP's current state.

I disagree that it's consistent. It introduces an entirely new kind of null
reference ("declared but uninitialised"), which will require more careful
checks than forcing the value to be initially null, and will undoubtedly
further annoy those who already claim that isset() is broken (the RFC
doesn't even mention it, but I presume uninitialised properties will return
false from isset(), but throw an error from is_null()).

If all typed properties require a valid default value, the patch is
simpler, errors are raised at a line where you can fix them, and the
language remains consistent - if you're promised a Foo, you get a Foo, not
a "whoops, why isn't there a Foo here?" error.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Rowan Collins — view source

unread

There are naturally 3 states in the engine:

1 - value set
2 - value not set (default null)
3 - undefined/uninitialised

These have been around since 5.0 AFAIK.

"Undefined" and "uninitialised" are not the same state:

class A {
public $alpha = 42;
public $beta;
// no such property as $charlie;
public SomeClass $delta;
}
$a = new A;

$a->alpha; // value set
$a->beta; // value not set (default null)
$a->charlie; // undefined, but still accessible, with implicit value null
$a->delta; // uninitialised; all attempts to access will throw an error

The behaviour of $a->charlie is consistent with other undefined variables
(e.g. a local variable can be read before it is written to). I can't think
of anything in the language which behaves the same way as $a->delta.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Larry Garfield — view source

unread

These don't really need explicit tests in most cases, but rather static
analysis (currently happening via docblocks). Static analysis tools like
vimeo/psalm already pick this up.

Then why do we need the type hints at all?

I'd even be happy to get type hints that only have effect on
ReflectionProperty#getType() as a massive improvement over the current
situation we've monkey-patched us into, but the patch is indeed consistent
with PHP's current state.

I disagree that it's consistent. It introduces an entirely new kind of null
reference ("declared but uninitialised"), which will require more careful
checks than forcing the value to be initially null, and will undoubtedly
further annoy those who already claim that isset() is broken (the RFC
doesn't even mention it, but I presume uninitialised properties will return
false from isset(), but throw an error from is_null()).

If all typed properties require a valid default value, the patch is
simpler, errors are raised at a line where you can fix them, and the
language remains consistent - if you're promised a Foo, you get a Foo, not
a "whoops, why isn't there a Foo here?" error.

Regards,

This is a possibly highly stupid idea, but I'll throw it out anyway so that it
can at least be explicitly rejected in that case...

Since Swift was mentioned before as a good example, what about borrowing from
Swift? Viz:

class Foo {

protected Bar $b;

// This runs before __construct, is not inherited, and cannot
// ever be skipped. If this method exits and any property is still
// value-less, TypeError immediately.
protected function __init() {
$this->b = new Bar();
}

public function __construct() {
Behaves as it always has.
}
}

That's similar to the "initialize" flag inside the constructor, but splits it
off to a separate method that is devoted to just that purpose; it therefore
has no parameters, ever, and if you want to initialize an object property
based on a constructor argument then you must either make it explicitly
nullable or initialize it to a dummy value first.

Which... I realize may not work well for service dependencies as the
dependency chain could get deep. Of course, that use case is generally a
protected property not publicly accessible, so making it nullable and trusting
yourself to populate it in the constructor is likely "safe enough".

Thoughts?

--Larry Garfield

7 years ago by Rowan Collins — view source

unread

class Foo {

protected Bar $b;

// This runs before __construct, is not inherited, and cannot
// ever be skipped. If this method exits and any property is still
// value-less, TypeError immediately.
protected function __init() {
$this->b = new Bar();
}

public function __construct() {
Behaves as it always has.
}
}

That's similar to the "initialize" flag inside the constructor, but splits
it
off to a separate method that is devoted to just that purpose; it
therefore
has no parameters, ever, and if you want to initialize an object property
based on a constructor argument then you must either make it explicitly
nullable or initialize it to a dummy value first.

The extra method certainly feels more "PHP-like" than an extra keyword, but
as you say, calling it automatically makes it awkward to initialise based
on any kind of input.

A couple of variations on the theme:

a) __init() takes the same arguments as __construct(), but is called first.
This might be rather confusing, though - we could insist on the signatures
matching, but people might be surprised that their constructor parameters
are passed to their object twice in different methods.

b) __init() is not called automatically, but has to be called manually,
with whatever parameters you want, as the first line of __construct(). This
feels a bit weird, because no other method name causes special behaviour
when called manually. On the other hand, it allows it to be called in other
situations, such as unserialize, which by-pass the constructor.

A compromise system which doesn't need new keywords or magic would be to
combine the current error-on-access behaviour with a check at the end of
the constructor.

So:

= as soon as the constructor starts, $this is available as normal
= accessing non-nullable properties of $this before initialising them would
give an error (as in the current proposal)

when the constructor returns, the engine checks that all non-nullable
properties have been correctly initialised, and immediately throws an error
if they have not
Serializable#unserialize() could run the same check, since it is
effectively a constructor
= ReflectionClass#newInstanceWithoutConstructor could just allow the
incomplete object, since such an object is not likely to work smoothly
anyway

Since $this can be passed to other functions and methods by the
constructor, there is still a chance of code outside the class seeing an
incomplete object and getting errors, but the distance between cause and
effect (which is my main objection to the current proposal) is massively
reduced.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Christoph M. Becker — view source

unread

class Foo {

protected Bar $b;

// This runs before __construct, is not inherited, and cannot
// ever be skipped. If this method exits and any property is still
// value-less, TypeError immediately.
protected function __init() {
$this->b = new Bar();
}

public function __construct() {
Behaves as it always has.
}
}

That's similar to the "initialize" flag inside the constructor, but splits
it
off to a separate method that is devoted to just that purpose; it
therefore
has no parameters, ever, and if you want to initialize an object property
based on a constructor argument then you must either make it explicitly
nullable or initialize it to a dummy value first.

The extra method certainly feels more "PHP-like" than an extra keyword, but
as you say, calling it automatically makes it awkward to initialise based
on any kind of input.

A couple of variations on the theme:

a) __init() takes the same arguments as __construct(), but is called first.
This might be rather confusing, though - we could insist on the signatures
matching, but people might be surprised that their constructor parameters
are passed to their object twice in different methods.

b) __init() is not called automatically, but has to be called manually,
with whatever parameters you want, as the first line of __construct(). This
feels a bit weird, because no other method name causes special behaviour
when called manually. On the other hand, it allows it to be called in other
situations, such as unserialize, which by-pass the constructor.

A compromise system which doesn't need new keywords or magic would be to
combine the current error-on-access behaviour with a check at the end of
the constructor.

So:

= as soon as the constructor starts, $this is available as normal
= accessing non-nullable properties of $this before initialising them would
give an error (as in the current proposal)

when the constructor returns, the engine checks that all non-nullable
properties have been correctly initialised, and immediately throws an error
if they have not

Serializable#unserialize() could run the same check, since it is
effectively a constructor
= ReflectionClass#newInstanceWithoutConstructor could just allow the
incomplete object, since such an object is not likely to work smoothly
anyway

Since $this can be passed to other functions and methods by the
constructor, there is still a chance of code outside the class seeing an
incomplete object and getting errors, but the distance between cause and
effect (which is my main objection to the current proposal) is massively
reduced.

It seems to me that either of these proposals would render the lazy
initialization pattern outlined in the “Overloaded Properties”
section[1] invalid.

[1] https://wiki.php.net/rfc/typed_properties_v2#overloaded_properties

--
Christoph M. Becker

7 years ago by Rowan Collins — view source

unread

It seems to me that either of these proposals would render the lazy
initialization pattern outlined in the “Overloaded Properties”
section[1] invalid.

[1] https://wiki.php.net/rfc/typed_properties_v2#overloaded_properties

Hm, I guess I didn't read that section carefully enough.

It strikes me that that entire code pattern is a hack due to lack of
property accessors, and it seems a shame to reduce the usefulness of the
language's type system just to support it.

From the responses on this thread, I get the feeling I'm in the minority,
but it still feels utterly wrong to me that a property marked as
non-nullable offers no guarantee at all that it will actually hold a valid
value.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Larry Garfield — view source

unread

It seems to me that either of these proposals would render the lazy
initialization pattern outlined in the “Overloaded Properties”
section[1] invalid.

[1] https://wiki.php.net/rfc/typed_properties_v2#overloaded_properties

Hm, I guess I didn't read that section carefully enough.

It strikes me that that entire code pattern is a hack due to lack of
property accessors, and it seems a shame to reduce the usefulness of the
language's type system just to support it.

From the responses on this thread, I get the feeling I'm in the minority,
but it still feels utterly wrong to me that a property marked as
non-nullable offers no guarantee at all that it will actually hold a valid
value.

Regards,

I get the sense that it should be treated as a bug in the implementing code,
not calling code. That is, given:

class Foo {
public Bar $bar;
}

If you call (new Foo)->bar and get a type error, you should not add null
checks but walk over to the Foo author's desk and slap them upside the head.
(Aka, file a bug report.)

I can see the value of punting there, and it's hardly the first time PHP has
punted on something in that fashion, but I agree that the whole point of such
checks is to make sure that the Foo author can't make that mistake in the
first place. That's rather the point of all type hints in PHP. The more
errors are compile time errors the better.

I wouldn't mind dropping that overloading behavior if it gave us more
guarantees about property validity.

--Larry Garfield

7 years ago by netmo.php@gmail.com — view source

unread

I agree with you. If someone really wants to have an "uninitialized" field
on purpose, they should do that using the correct type declaration, i.e.:

?MyType $myNullable = null;

When this was started I asked if it was possible to check types right after
object has been constructed, but they said it was inefficient to do so.
If this is still true, then I'm ok with checking on first access instead. I
don't want even more overhead from runtime type checking.
This is definitely not as bad as the "billion dollar mistake" though. The
error still happens, but in a different place.

7 years ago by Rowan Collins — view source

unread

This is definitely not as bad as the "billion dollar mistake" though. The
error still happens, but in a different place.

That's exactly what happens in most languages, isn't it? An error happens
when you try to resolve a null reference, but by then it's far too late to
do anything about it.

Regards,

Rowan Collins
[IMSoP]

7 years ago by netmo.php@gmail.com — view source

unread

It's more strict in the proposed rfc. In particular in java the error is
allowed to propagate in the program, while here it won't be.
In other words if $foo->aaa is uninitialized, you are not allowed to do $baz->bbb = $foo->aaa.
In java that's allowed, so in java null pointer deref can really happen
anywhere without knowing its origin, and that's really a big problem.

7 years ago by Rowan Collins — view source

unread

It's more strict in the proposed rfc. In particular in java the error is
allowed to propagate in the program, while here it won't be.
In other words if $foo->aaa is uninitialized, you are not allowed to
do $baz->bbb = $foo->aaa.
In java that's allowed, so in java null pointer deref can really happen
anywhere without knowing its origin, and that's really a big problem.

Ah, yes, point taken. I'd overlooked the possibility of copying the
reference without dereferencing it. That is indeed worse.

Although the ability to pass around, duplicate, and manipulate an object in
various ways and then suddenly get an error that should have been thrown in
the constructor, is still pretty unhelpful.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Stanislav Malyshev — view source

unread

Hi!

I agree with you. If someone really wants to have an "uninitialized" field
on purpose, they should do that using the correct type declaration, i.e.:

?MyType $myNullable = null;

When this was started I asked if it was possible to check types right after
object has been constructed, but they said it was inefficient to do so.
If this is still true, then I'm ok with checking on first access instead. I
don't want even more overhead from runtime type checking.
This is definitely not as bad as the "billion dollar mistake" though. The
error still happens, but in a different place.

In Java, avoiding these kinds of errors - when the property is not
initialized - is almost trivial, and that's not what "billion dollar
mistake" is about. Lack of initialization for a final value (which is a
frequent case with property values) is detected by the compiler, and
most IDEs/analysis tools would alert of lack of initialization for
non-final value too. The problem is with usage on null as a substitution
for no value/error/unknown value and this value bleeding out into parts
of the code that don't know how to handle this situation.

--
Stas Malyshev
smalyshev@gmail.com

7 years ago by Rowan Collins — view source

unread

Hi!

I agree with you. If someone really wants to have an "uninitialized"
field
on purpose, they should do that using the correct type declaration,
i.e.:

?MyType $myNullable = null;

When this was started I asked if it was possible to check types right
after
object has been constructed, but they said it was inefficient to do
so.
If this is still true, then I'm ok with checking on first access
instead. I
don't want even more overhead from runtime type checking.
This is definitely not as bad as the "billion dollar mistake" though.
The
error still happens, but in a different place.

In Java, avoiding these kinds of errors - when the property is not
initialized - is almost trivial, and that's not what "billion dollar
mistake" is about. Lack of initialization for a final value (which is a
frequent case with property values) is detected by the compiler, and
most IDEs/analysis tools would alert of lack of initialization for
non-final value too. The problem is with usage on null as a
substitution
for no value/error/unknown value and this value bleeding out into parts
of the code that don't know how to handle this situation.

Fair enough.

To avoid us getting sidetracked, I retract my mistaken comparison to the "billion dollar mistake". It felt like a cute reference to include, but it didn't really help explain my point.

However, I stand by my opinion that introducing an error-on-access state for properties will lead to more problems than it solves, and that if we can't guarantee non-nullable properties are actually non-null, then we shouldn't allow them at all.

Regards,

--
Rowan Collins
[IMSoP]

7 years ago by Nikita Popov — view source

unread

On Sat, Jul 14, 2018 at 3:09 PM, Rowan Collins rowan.collins@gmail.com
wrote:

Hi all,

As briefly mentioned, I think the approach to non-nullable types in the
typed properties proposal needs more discussion, to make sure we're not
repeating Tony Hoare's "billion-dollar mistake".

For parameter hints, nullability is naturally optional, particularly in
PHP where "nullable Foo" is equivalent to a union type "null or Foo". For
properties, this isn't the case, because every property must have some
initial state.

The simplest solution, which I think would be a sensible starting point if
we're not ready for more complex changes, is to say that all typed
properties must have a default value, and if that default value is null,
the type hint must naturally be nullable. In other words, make "public Foo
$foo" illegal, but allow "public ?Foo $foo=null".

The current RFC proposes the next simplest solution, which is to allow
non-nullable types, and trust the user to initialise them before use.

My first objection to this is that it creates an entirely new state for
object properties to be in, which behaves differently from everything else
in the language. There will then be three or four different ways for an
object property to be "unset":

not declared at all; access gives a notice and the implicit value null

declared with no default, uninitialised, implicitly null

declared with a default of null, or explicitly assigned as null

declared with a non-nullable type, never initialised; rather than an
implicit null, access generates an Error

The bigger problem is that the properties are non-nullable in name only,
and the declaration can't be trusted. Consider the following class:

class Validity {
public \DateTimeInterface $validFrom;
public ?\DateTimeInterface $validTo = null;
}

If I have an instance $v of that class, I would expect to be able to
safely call $v->validFrom->getTimestamp(), but need to check for nulls
before doing the same with validTo. Under the current proposal, doing so
will risk errors, so I will have to check both properties before access
anyway. Worse, using is_null() or ?: will attempt to retrieve the value, so
I actually have to be more careful, and use isset() or ?? instead.

I'm just going to reply to this one line, because this is the root of your
misunderstanding:

The user should never, ever, under any circumstances (*) check for the
initialization of a typed property. The user simply uses the property under
the firm knowledge that it will be initialized -- or there is a bug in the
initialization code. If there is a bug in the initialization code then it
needs to be fixed there, not by introducing a check when working with the
object.

This is very, very different from a nullable property, where it is the
users responsibility to handle the null value. If the user receives a null
value it does not indicate an initialization bug and they must write
code to account for it.

Please do also remember that uninitialized properties and unset properties
are the same thing, and during the last proposal there was a strong
consensus that we must retain support for property unsetting to be
compatible with Doctrine and other libraries using proxy objects.

Nikita

(*) Okay, there are exceptions, e.g. introspection libraries ;)

Alternatively, I can trust the author of the class, but at that point they
might as well just use a docblock - the type hint documents their intent,
but is not enforced.

Swift is often cited as a language which gets nullability right, and a lot
of attention is given to Options, and the compiler ensuring that the None /
null case is handled; but at least as important is how it handles
initialisation of non-nullable properties, using "two-phase
initialisation": https://docs.swift.org/swift-b
ook/LanguageGuide/Initialization.html

In short, Swift defines a specific point after which all properties must
have a valid value; before this point, the entire object is invalid for
read operations, so there is no need for a specific error when accessing
uninitialised properties.

Swift marks this point by the call to the parent initialiser, ultimately
going up the chain to the root object. PHP has no root object, and no
mandatory parent constructor calls, so would need a different way to mark
this stage. Constructors can already violate the rules the first phase
should impose, so we can't just use the end of the constructor as the
validation point.

One possibility would be to add a new keyword like "initialize" which must
be added to constructors in the presence of non-nullable properties. Above
that keyword, use of $this other than to write to its properties would be
an error; afterwards, the constructor could carry on as normal.

As I say, this is complex, and it may be best to add nullable typed
properties first, and give more time to try out approaches to
initialisation.

Regards,

Rowan Collins
[IMSoP]

7 years ago by Rowan Collins — view source

unread

On Sat, Jul 14, 2018 at 3:09 PM, Rowan Collins rowan.collins@gmail.com
wrote:

If I have an instance $v of that class, I would expect to be able to

safely call $v->validFrom->getTimestamp(), but need to check for nulls
before doing the same with validTo. Under the current proposal, doing so
will risk errors, so I will have to check both properties before access
anyway. Worse, using is_null() or ?: will attempt to retrieve the value, so
I actually have to be more careful, and use isset() or ?? instead.

I'm just going to reply to this one line, because this is the root of your
misunderstanding:

The user should never, ever, under any circumstances (*) check for the
initialization of a typed property.

Agreed. Which is why the language should guarantee that they don't need to.

The user simply uses the property under the firm knowledge that it will be
initialized -- or there is a bug in the initialization code. If there is a
bug in the initialization code then it needs to be fixed there, not by
introducing a check when working with the object.

Absolutely. But in order for that to happen, the error needs to be shown to
the author of that code, not to some downstream user who will have no way
to fix it.

As I mentioned before, I see no practical difference between seeing this
comment in a library...

/**

You can rely on this property being initialised to a non-null value in
the constructor
*/
public ?Foo $foo = null;

... and seeing this declaration under the current proposal ...

public Foo $foo;

In both cases, as the user of the library, I'm entirely at the mercy of the
library author to implement the documented behaviour. If I trust them, and
use the property without checking it, I risk errors if they have made a
mistake, which I can't fix.

Please do also remember that uninitialized properties and unset properties
are the same thing

No they're not, an unset property doesn't raise an error when you try to
access it; see my last e-mail to Marco.

Regards,

Rowan Collins
[IMSoP]