[RFC] Add WeakMap - Externals

5 years ago by Nikita Popov — view source

unread

Hi internals,

This is a follow up to the addition of WeakReference in PHP 7.4.
WeakReference is an important primitive, but what people usually really
need are weak maps, which can't be implemented on top of WeakReference (at
least, not as exposed in PHP).

This RFC proposes to add a native WeakMap type for PHP 8:
https://wiki.php.net/rfc/weak_maps

Regards,
Nikita

5 years ago by Benjamin Morel — view source

unread

Hi Nikita,

After reading the RFC, I have no comments to make, but I just want to thank
you for working on this. I regretted a lot that this wasn't implemented
together with WeakReference in PHP 7.4 https://externals.io/message/106373,
as the use cases for WeakReference vs WeakMap are really narrow.

Cheers,
Benjamin

Hi internals,

This is a follow up to the addition of WeakReference in PHP 7.4.
WeakReference is an important primitive, but what people usually really
need are weak maps, which can't be implemented on top of WeakReference (at
least, not as exposed in PHP).

This RFC proposes to add a native WeakMap type for PHP 8:
https://wiki.php.net/rfc/weak_maps

Regards,
Nikita

5 years ago by Nikita Popov — view source

unread

Hi internals,

This is a follow up to the addition of WeakReference in PHP 7.4.
WeakReference is an important primitive, but what people usually really
need are weak maps, which can't be implemented on top of WeakReference (at
least, not as exposed in PHP).

This RFC proposes to add a native WeakMap type for PHP 8:
https://wiki.php.net/rfc/weak_maps

Regards,
Nikita

Any comments on this proposal? Otherwise this could head to voting...

Nikita

5 years ago by Dennis Birkholz — view source

unread

Hi Nikita,

Am 04.12.19 um 19:50 schrieb Nikita Popov:

This RFC proposes to add a native WeakMap type for PHP 8:
https://wiki.php.net/rfc/weak_maps

Any comments on this proposal? Otherwise this could head to voting...

thanks for this proposal, will be really helpful!

The only caveat for me is that WeakMap is not serializable. Wouldn't it
be possible to allow serialization by just serializing it as an array
with all objects that are still valid? This would avoid serialization
errors and make using it with serialization very easy. It would not
unserialize as a WeakMap but that is not that a great problem for me.

Or maybe it could extend SplObjectStorage (or implement the same [new]
interface) and could be serialized as an SplObjectStorage object instead
of an array.

Thanks for your great work, keep it up!

Greets
Dennis

5 years ago by Nikita Popov — view source

unread

On Thu, Dec 5, 2019 at 6:09 PM Dennis Birkholz php@dennis.birkholz.biz
wrote:

Hi Nikita,

Am 04.12.19 um 19:50 schrieb Nikita Popov:

This RFC proposes to add a native WeakMap type for PHP 8:
https://wiki.php.net/rfc/weak_maps

Any comments on this proposal? Otherwise this could head to voting...

thanks for this proposal, will be really helpful!

The only caveat for me is that WeakMap is not serializable. Wouldn't it
be possible to allow serialization by just serializing it as an array
with all objects that are still valid? This would avoid serialization
errors and make using it with serialization very easy. It would not
unserialize as a WeakMap but that is not that a great problem for me.

This is not possible. Classes always have to serialize to themselves ;)

Or maybe it could extend SplObjectStorage (or implement the same [new]
interface) and could be serialized as an SplObjectStorage object instead
of an array.

Could you provide some context on why you think serialization support for
WeakMap is important? As weak maps are essentially caching structures,
serializing them doesn't seem particularly useful in the first place, but
when combined with the quite unintuitive behavior the serialization would
have, I feel that it is better to leave this to the user (same as
WeakReference).

Specifically what I mean by uninituitive is this: When you do a $s =
serialize($weakMap), you'll get back a large payload string, but when you
then try to do an unserialize($s) you'll get back an empty WeakMap (or
worse: a weak map that will only become empty on the next GC), because all
of those objects will get removed as soon as unserialization is finalized.
That "works", but doesn't seem terribly useful and is likely doing to be a
wtf moment.

Nikita

5 years ago by Dennis Birkholz — view source

unread

Hi Nikita,

Could you provide some context on why you think serialization support for
WeakMap is important? As weak maps are essentially caching structures,
serializing them doesn't seem particularly useful in the first place, but
when combined with the quite unintuitive behavior the serialization would
have, I feel that it is better to leave this to the user (same as
WeakReference).

structures provided by the PHP core tend to be used in the wild. As PHP
lacks a method to check whether a given object (and all other objects
contained within it) can be serialized (without traversing the complete
object graph), each new always-available data structure that is not
serializable increases the risk to encounter an object that is not
serializable. That is the reason I prefer new data structures to be
serializable.

What I see coming is something like this: some kind of object that can
contain other object and attach some meta information to that objects
(that is stored as the value in a weak map). When an object is removed
from the collection, the meta information gets removed eventually, no
need to manually clear it in the remove-object method. If there are many
different kinds of meta information this would save a lot of code in the
remove method! Even though this seems to not be the intended use case,
programmers tend to safe some key strokes here and there. That type of
container object is not serializable unless the programmer takes extra
steps and implements serialization him/herself.

Specifically what I mean by uninituitive is this: When you do a $s =
serialize($weakMap), you'll get back a large payload string, but when you
then try to do an unserialize($s) you'll get back an empty WeakMap (or
worse: a weak map that will only become empty on the next GC), because all
of those objects will get removed as soon as unserialization is finalized.
That "works", but doesn't seem terribly useful and is likely doing to be a
wtf moment.

Ok, my intention was to have a more sophisticated approach: when the
WeakMap is serialized, only objects in the object graph that is
serialized are considered alive, all other objects are not serialized.
So directly serializing a WeakMap would result in an empty map but
serializing an object that contains a list of objects and a WeakMap
containing some of the same child objects would create a meaningful
payload string and unserialize would reconstruct the same object with
only objects from the child list available in the WeakMap any more.

I understand that this may complicate the implementation a lot (or even
be not possible). But my just want to repeat my main concern: buildin
data structures that are not serializable are a real problem for users
that use serialization extensively. Maybe the solution to that problem
is a method to check whether a provided object graph can be serialized
(which may not be possible due to throwing an exception in __sleep() or
something like that), some way to ignore unserializable elements or some
way to register callback methods to handle unserializable elements.

Anyway, thanks for taking your time and for bringing this proposal forward!

Greets
Dennis

5 years ago by Nikita Popov — view source

unread

On Tue, Dec 10, 2019 at 12:03 PM Dennis Birkholz php@dennis.birkholz.biz
wrote:

Hi Nikita,

Could you provide some context on why you think serialization support for
WeakMap is important? As weak maps are essentially caching structures,
serializing them doesn't seem particularly useful in the first place, but
when combined with the quite unintuitive behavior the serialization would
have, I feel that it is better to leave this to the user (same as
WeakReference).

structures provided by the PHP core tend to be used in the wild. As PHP
lacks a method to check whether a given object (and all other objects
contained within it) can be serialized (without traversing the complete
object graph), each new always-available data structure that is not
serializable increases the risk to encounter an object that is not
serializable. That is the reason I prefer new data structures to be
serializable.

What I see coming is something like this: some kind of object that can
contain other object and attach some meta information to that objects
(that is stored as the value in a weak map). When an object is removed
from the collection, the meta information gets removed eventually, no
need to manually clear it in the remove-object method. If there are many
different kinds of meta information this would save a lot of code in the
remove method! Even though this seems to not be the intended use case,
programmers tend to safe some key strokes here and there. That type of
container object is not serializable unless the programmer takes extra
steps and implements serialization him/herself.

Specifically what I mean by uninituitive is this: When you do a $s =
serialize($weakMap), you'll get back a large payload string, but when you
then try to do an unserialize($s) you'll get back an empty WeakMap (or
worse: a weak map that will only become empty on the next GC), because
all
of those objects will get removed as soon as unserialization is
finalized.
That "works", but doesn't seem terribly useful and is likely doing to be
a
wtf moment.

Ok, my intention was to have a more sophisticated approach: when the
WeakMap is serialized, only objects in the object graph that is
serialized are considered alive, all other objects are not serialized.
So directly serializing a WeakMap would result in an empty map but
serializing an object that contains a list of objects and a WeakMap
containing some of the same child objects would create a meaningful
payload string and unserialize would reconstruct the same object with
only objects from the child list available in the WeakMap any more.

I understand that this may complicate the implementation a lot (or even
be not possible).

This is indeed not possible. When we serialize the WeakMap, we do not know
what else will be serialized as well. We can only serialize everything and
let unserialization discard objects that are no longer live.

But my just want to repeat my main concern: buildin
data structures that are not serializable are a real problem for users
that use serialization extensively. Maybe the solution to that problem
is a method to check whether a provided object graph can be serialized
(which may not be possible due to throwing an exception in __sleep() or
something like that), some way to ignore unserializable elements or some
way to register callback methods to handle unserializable elements.

I'm must be missing something obvious here: Isn't this a reliable way to
detect whether an object graph is serializable?

try {
$serialized = serialize($value);
} catch (\Throwable $e) {
// not serializable
}

Nikita

5 years ago by Dennis Birkholz — view source

unread

On Tue, Dec 10, 2019 at 12:03 PM Dennis Birkholz php@dennis.birkholz.biz
wrote:

But my just want to repeat my main concern: buildin
data structures that are not serializable are a real problem for users
that use serialization extensively. Maybe the solution to that problem
is a method to check whether a provided object graph can be serialized
(which may not be possible due to throwing an exception in __sleep() or
something like that), some way to ignore unserializable elements or some
way to register callback methods to handle unserializable elements.

I'm must be missing something obvious here: Isn't this a reliable way to
detect whether an object graph is serializable?

try {
$serialized = serialize($value);
} catch (\Throwable $e) {
// not serializable
}

Checking whether the serialization process worked is different from
checking beforehand whether an object graph is serializable as it makes
it a lot easier to display a meaningful error message. Something like
the Serializable interface in Java. But that seems to not be the PHP way
so far, I will have to think about an RFC for a second parameter for
serialize() that accepts a callback that can "serialize"
objects/variables that fail to serialize themselves...

Greets
Dennis

5 years ago by Nikita Popov — view source

unread

On Mon, Dec 16, 2019 at 10:19 AM Dennis Birkholz php@dennis.birkholz.biz
wrote:

On Tue, Dec 10, 2019 at 12:03 PM Dennis Birkholz <
php@dennis.birkholz.biz>
wrote:

But my just want to repeat my main concern: buildin
data structures that are not serializable are a real problem for users
that use serialization extensively. Maybe the solution to that problem
is a method to check whether a provided object graph can be serialized
(which may not be possible due to throwing an exception in __sleep() or
something like that), some way to ignore unserializable elements or some
way to register callback methods to handle unserializable elements.

I'm must be missing something obvious here: Isn't this a reliable way to
detect whether an object graph is serializable?

try {
$serialized = serialize($value);
} catch (\Throwable $e) {
// not serializable
}

Checking whether the serialization process worked is different from
checking beforehand whether an object graph is serializable as it makes
it a lot easier to display a meaningful error message. Something like
the Serializable interface in Java. But that seems to not be the PHP way
so far, I will have to think about an RFC for a second parameter for
serialize() that accepts a callback that can "serialize"
objects/variables that fail to serialize themselves...

Could you please explain in more detail what the practical distinction
between checking beforehand and catching an exception is? It seems like the
exception should be sufficient to display a meaningful error message --
heck, it already contains an error message you can use.

Nikita

5 years ago by Dennis Birkholz — view source

unread

On Mon, Dec 16, 2019 at 10:19 AM Dennis Birkholz php@dennis.birkholz.biz
wrote:

On Tue, Dec 10, 2019 at 12:03 PM Dennis Birkholz <
php@dennis.birkholz.biz>
wrote:

But my just want to repeat my main concern: buildin
data structures that are not serializable are a real problem for users
that use serialization extensively. Maybe the solution to that problem
is a method to check whether a provided object graph can be serialized
(which may not be possible due to throwing an exception in __sleep() or
something like that), some way to ignore unserializable elements or some
way to register callback methods to handle unserializable elements.

I'm must be missing something obvious here: Isn't this a reliable way to
detect whether an object graph is serializable?

try {
$serialized = serialize($value);
} catch (\Throwable $e) {
// not serializable
}

Checking whether the serialization process worked is different from
checking beforehand whether an object graph is serializable as it makes
it a lot easier to display a meaningful error message. Something like
the Serializable interface in Java. But that seems to not be the PHP way
so far, I will have to think about an RFC for a second parameter for
serialize() that accepts a callback that can "serialize"
objects/variables that fail to serialize themselves...

Could you please explain in more detail what the practical distinction
between checking beforehand and catching an exception is? It seems like the
exception should be sufficient to display a meaningful error message --
heck, it already contains an error message you can use.

The practical distinction would be the error is raised in the moment I
build the object graph and assign a value that is not serializable and
not in the moment I try to serialize. If my object graph is very complex
finding the culprit is a lot easier that way.

But typical PHP way is not to force each and every object in the graph
to implement some interface that makes each object serialization safe
(like in the Java Serializable case).

From my side there is no need to continue this discussion. There seems
to be no easy and sensible way to make WeakMap serializable so I rest my
case.

Nevertheless I appreciate that you took your time to discuss this Nikita.