New handler for retrieving properties when object is serialized

11 years ago by Jakub Zelenka — view source

unread

Hi,

I have just sent this PR: https://github.com/php/php-src/pull/397

It's reaction to the comment from nikic in PR:
https://github.com/php/php-src/pull/393

The patch is about adding new object handler that is used when object is
serialized. Currently this needs to be done using get_properties which is
sometimes a bit over kill.

It's useful in situation if you only need to change properties for
serialization. It's very similar to get_debug_info that is used only in
specific situations (print_r...)

Currently I call it get_serialize_info . It was the first name that came to
my mind. :) I am happy to change if someone has a better idea?

Any thoughts?

Jakub

11 years ago by Stas Malyshev — view source

unread

Hi!

It's useful in situation if you only need to change properties for
serialization. It's very similar to get_debug_info that is used only in
specific situations (print_r...)

Isn't __sleep already doing this?

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Nikita Popov — view source

unread

On Thu, Jul 25, 2013 at 9:09 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:

Hi!

It's useful in situation if you only need to change properties for
serialization. It's very similar to get_debug_info that is used only in
specific situations (print_r...)

Isn't __sleep already doing this?

__sleep returns an array of object property names to be serialized. Using
__sleep you can restrict the set of serialized properties, but the
properties still need to exist.

Nikita

11 years ago by Stas Malyshev — view source

unread

Hi!

__sleep returns an array of object property names to be serialized.
Using __sleep you can restrict the set of serialized properties, but the
properties still need to exist.

You can create those properties when processing __sleep. Nothing
requires them to exist before __sleep was called.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Jakub Zelenka — view source

unread

On Thu, Jul 25, 2013 at 8:56 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:

You can create those properties when processing __sleep. Nothing
requires them to exist before __sleep was called.

The problem is that properties will exist after __sleep was called. :)
There are situations that you don't want them to exist (for example
DateTime).

11 years ago by johannes@schlueters.de — view source

unread

On Thu, Jul 25, 2013 at 8:56 PM, Stas Malyshev smalyshev@sugarcrm.comwrote:

You can create those properties when processing __sleep. Nothing
requires them to exist before __sleep was called.

The problem is that properties will exist after __sleep was called. :)
There are situations that you don't want them to exist (for example
DateTime).

What about using our Serializable interface? - That was invented to
serialize internal objects with arbitrary internal data.

johannes

11 years ago by Jakub Zelenka — view source

unread

What about using our Serializable interface? - That was invented to
serialize internal objects with arbitrary internal data.

This is a bit painful to generate and then parse (unserialize) string if
you do it in C... Especially if you need only add few properties that will
contain data for serialization. In that case Serailizable is a bit
overkill...

And for my use case (DateTime serialization) it would be BC as well.
Different format (Serializable always generate Custom string).

Jakub

11 years ago by johannes@schlueters.de — view source

unread

This is a bit painful to generate and then parse (unserialize) string
if you do it in C... Especially if you need only add few properties
that will contain data for serialization. In that case Serailizable is
a bit overkill...

So rather change the engine, which affects way more things and add yet
another serialisation hook making it even more confusing what to use?

And for my use case (DateTime serialization) it would be BC as well.
Different format (Serializable always generate Custom string).

You're solution also depends on an API break and also changes the way
the objects behave during serialization/unserialization.

Doesn't convince me.

johannes

11 years ago by Jakub Zelenka — view source

unread

So rather change the engine, which affects way more things and add yet
another serialisation hook making it even more confusing what to use?

Yes I agree that the new hook is confusing. I think that what Gustavo said
would be actually much better. Having some flag that would say what is
the purpose of retrieving properties could be very useful. It's not just
serialization
but also casting object to array. There can be even more use cases
in the future. On top of it the hooks like get_gc and get_debug_info
could be part of it. But I would rather discuss this leter
when I send a patch.

You're solution also depends on an API break and also changes the way
the objects behave during serialization/unserialization.

Not sure what you mean by API break and changing the way the objects
behave? When you look to this pach
https://github.com/bukka/php-src/compare/date_serialize there shouldn't be
any difference in object behaviour. Or am I wrong?

By BC I meant the resulted string that you get after using serialize
function. If you use Serializable you get something that is prefixed by C:
. But if you have object, the prefix is O: . Please see this
http://3v4l.org/Sd8aT .

Jakub

11 years ago by johannes@schlueters.de — view source

unread

So rather change the engine, which affects way more things and add yet
another serialisation hook making it even more confusing what to use?

Yes I agree that the new hook is confusing. I think that what Gustavo said
would be actually much better. Having some flag that would say what is
the purpose of retrieving properties could be very useful. It's not just
serialization
but also casting object to array. There can be even more use cases
in the future. On top of it the hooks like get_gc and get_debug_info
could be part of it. But I would rather discuss this leter
when I send a patch.

This means one has to touch every extension using those hooks. "more use
cases in the future" means having to do this over and over again, for a
feature I see little benefit in. Serializable was added precisely for
the purpose to allow arbitrary internal classes to provide serializer
logic.

You're solution also depends on an API break and also changes the way
the objects behave during serialization/unserialization.

Not sure what you mean by API break

Breaking PHP's API. Requiring modifications to other extensions.

and changing the way the objects
behave? When you look to this pach
https://github.com/bukka/php-src/compare/date_serialize there shouldn't be
any difference in object behaviour. Or am I wrong?

I haven't looked deeper but isn't your whole point to change the
serialized representation of the object, to include all data "properly"?

By BC I meant the resulted string that you get after using serialize
function. If you use Serializable you get something that is prefixed by C:
. But if you have object, the prefix is O: . Please see this
http://3v4l.org/Sd8aT .

I know the difference. So yes, new serialized files can't be properly
read on old PHP.

johannes

11 years ago by Jakub Zelenka — view source

unread

This means one has to touch every extension using those hooks. "more use
cases in the future" means having to do this over and over again, for a
feature I see little benefit in.

I don't want to change get_properties. That would break too many
extensions. The idea is to create something like

HashTable *get_special_properties(zval *object, int purpose, int *is_tmp
TSRMLS_DC)

This would replace get_debug_info so the number of hooks would stay the
same.
The only extensions that would need to be changed are the ones that use
get_debug_info. The purposes could be debug, serialize, array_casting. If
there is
a new use case, you would just add new purpose and wouldn't have to change
anything.

I haven't looked deeper but isn't your whole point to change the
serialized representation of the object, to include all data "properly"?

No the whole point was returning the same properties but only if the object
is
serialized or debugged. Deeper explanation is in this thread:

http://news.php.net/php.internals/68305

Jakub

11 years ago by johannes@schlueters.de — view source

unread

This means one has to touch every extension using those hooks. "more use
cases in the future" means having to do this over and over again, for a
feature I see little benefit in.

I don't want to change get_properties. That would break too many
extensions. The idea is to create something like

HashTable *get_special_properties(zval *object, int purpose, int *is_tmp
TSRMLS_DC)

This would replace get_debug_info so the number of hooks would stay the
same.
The only extensions that would need to be changed are the ones that use
get_debug_info. The purposes could be debug, serialize, array_casting. If
there is a new use case, you would just add new purpose and wouldn't have to change
anything.

Such a way for overloading is bad. This is painful on each update. If
there are different things to do use different hooks. The
implementations might call the same implementation but a "purpose"
option leads to a long term maintenance hell whenever there is a
change.(a tiny bit better would be using an enum instead of int as in
some situations compilers/analyzers might give a hint ... but still a
mess)

But I still don't buy the argument for this hook.

Typically internal classes use some C data structure for keeping their
internal state. Often actually in a form that it can not be represented
in zval. From that observation (after a few years of looking at phpsrc,
pecl external and close-source exts) Serializable is the preferred way
to serialize these in an efficient way (converting everything in zvals,
then serialize, finally freeing temporary data is a quite expensive
way ..)
(mind: comparison with debug hooks is only of limited use. a) a debug
feature has less performance relevance b) a debug feature doesn't need
to convert the full state and has no need to be reversible)

So what is being proposed is to extend our already too large API with
yet another way of doing something for edge cases for which we have a
good solution for the general case already.

I haven't looked deeper but isn't your whole point to change the
serialized representation of the object, to include all data "properly"?

No the whole point was returning the same properties but only if the object
is serialized or debugged. Deeper explanation is in this thread:

So the purpose is to have a different serialized representation, so
there would be compatibility issues, similar to going to Serializable.

johannes

11 years ago by Jakub Zelenka — view source

unread

On Fri, Jul 26, 2013 at 2:59 PM, Johannes Schlüter
johannes@schlueters.dewrote:

Such a way for overloading is bad. This is painful on each update. If
there are different things to do use different hooks. The
implementations might call the same implementation but a "purpose"
option leads to a long term maintenance hell whenever there is a
change.(a tiny bit better would be using an enum instead of int as in
some situations compilers/analyzers might give a hint ... but still a
mess)

Ok so probably better not to do that :)

But I still don't buy the argument for this hook.

Typically internal classes use some C data structure for keeping their
internal state. Often actually in a form that it can not be represented
in zval. From that observation (after a few years of looking at phpsrc,
pecl external and close-source exts) Serializable is the preferred way
to serialize these in an efficient way (converting everything in zvals,
then serialize, finally freeing temporary data is a quite expensive
way ..)

I agree that Serializable is better for performance. The reason why
I proposed this is to differentiate serialization properties from normal
properties in cases where Serializable would require lots
of coding for a small benefit (object is not serialized often)
or it is not an options (DateTime - see bellow).

So the purpose is to have a different serialized representation, so
there would be compatibility issues, similar to going to Serializable.

Yeah but that serialized representation is exactly the same as
the current representation (the serialized strings before the patch is
exactly the same as the string after the patch :)). The difference is that
get_properties returns empty HashTable now. The serialization is
backward compatible though.

I just want to find a solution for this problem. What would you do?
Would you use Serialazable even if it breaks apps after updating PHP.
Or would you still use get_properties (possibly with testing
serialize_lock) ?

Jakub

11 years ago by Gustavo Lopes — view source

unread

Such a way for overloading is bad. This is painful on each update. If
there are different things to do use different hooks. The
implementations might call the same implementation but a "purpose"
option leads to a long term maintenance hell whenever there is a
change.(a tiny bit better would be using an enum instead of int as in
some situations compilers/analyzers might give a hint ... but still a
mess)

This is a gross misrepresentation. The vast majority of extensions will
not care about this purpose, just like most extensions do not care about
get_debug_info or get_gc or about the purpose in read_property. If a new
purpose is added, this would not be any different than adding a new hook
as long as:

the extension doesn't purposefully break forward compatibility, for
example by doing switch(purpose) { ... default: panic(); } and
the new hook would default to get_properties, either by explicitly
setting the pointer to get_properties in the standard handlers array or
by having the engine fall back to get_properties (the last one was the
solution adopted for get_debug_info and get_gc afaik).

Yes, this would break extensions. But we never made any promises about
source BC for extensions. In fact, we actually break it for every
release, sometimes in rather subtle ways like in 5.4 (due to interned
strings, and object properties storage).

So what is being proposed is to extend our already too large API with
yet another way of doing something for edge cases for which we have a
good solution for the general case already.

Alternatively, could the problem perhaps be mitigated by making
available some utility functions for serializing native types?

--
Gustavo Lopes

11 years ago by Stas Malyshev — view source

unread

Hi!

Yes, this would break extensions. But we never made any promises about
source BC for extensions. In fact, we actually break it for every

Actually, we try to preserve source BC for extensions, unless they do
some weird things or they go very deep into the implementation details.
"Regular" extension generally should not suffer from adding a feature.

Alternatively, could the problem perhaps be mitigated by making
available some utility functions for serializing native types?

We could definitely make some parts of serializer API-accessible, or
even make serializer as a whole to be more API-friendly, I think it'd be
a nice idea if making implementing Serializable would be made easy by
reusing serializer code and combining pieces. If now it's hard making it
easier definitely a good idea, much better than creating one more API IMHO.

Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Jakub Zelenka — view source

unread

Alternatively, could the problem perhaps be mitigated by making
available some utility functions for serializing native types?

We could definitely make some parts of serializer API-accessible, or
even make serializer as a whole to be more API-friendly, I think it'd be
a nice idea if making implementing Serializable would be made easy by
reusing serializer code and combining pieces. If now it's hard making it
easier definitely a good idea, much better than creating one more API IMHO.

I was thinking about that and it could actually be done in a way that would
resolve the problem. I have just created RFC that specify the API.

https://wiki.php.net/rfc/internal_serialize_api

It's backward compatible and allow portable object generation in more
abstract way.

It's just draft so I am happy to do changes.

Jakub

11 years ago by Michael Wallner — view source

unread

Alternatively, could the problem perhaps be mitigated by making
available some utility functions for serializing native types?

We could definitely make some parts of serializer API-accessible, or
even make serializer as a whole to be more API-friendly, I think it'd be
a nice idea if making implementing Serializable would be made easy by
reusing serializer code and combining pieces. If now it's hard making it
easier definitely a good idea, much better than creating one more API IMHO.

I was thinking about that and it could actually be done in a way that would
resolve the problem. I have just created RFC that specify the API.

https://wiki.php.net/rfc/internal_serialize_api

It's backward compatible and allow portable object generation in more
abstract way.

What's about unserialization?
Where does the state come from when recursively calling serialize?

--
Regards,
Mike

11 years ago by Jakub Zelenka — view source

unread

What's about unserialization?

Yeah that's actually a good point! :) It could be handled by implicit
object unserialization if the unserialize callback is NULL. However this is
not optimal for performance and also create properties in the object which
doesn't have to be necessary. There should be functions that help with
parsing the string buffer that is passed to unserialize class callback.
Sort of opposite to API for serialization.

It should be part of RFC. I will add it.

Where does the state come from when recursively calling serialize?

I am not sure if I understand the question. The serialize_state will be a
sort of wrapper for smart_str with some additions for checking whether the
functions are called in correct order. The state instance will be in most
cases a local variable (as it is in the example). In case you try to
serialize object zval (using php_serailize_property_zval), it would try to
serialize the object the same way and the resulted string would be added to
the state buffer (will be in RFC...). Did I answer your question?

I will try to clarify the RFC a bit. Any feedback would be much
appreciated. Thanks.

Jakub

11 years ago by Stas Malyshev — view source

unread

Hi!

This is a bit painful to generate and then parse (unserialize) string if
you do it in C... Especially if you need only add few properties that
will contain data for serialization. In that case Serailizable is a bit
overkill...

And for my use case (DateTime serialization) it would be BC as well.
Different format (Serializable always generate Custom string).

I think given that we already have not one, but two interfaces for
dealing with serialization (__sleep and Serializable), adding a third
one that does essentially the same but with a little difference would be
really too much.

--
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
(408)454-6900 ext. 227

11 years ago by Gustavo Lopes — view source

unread

I have just sent this PR: https://github.com/php/php-src/pull/397

It's reaction to the comment from nikic in PR:
https://github.com/php/php-src/pull/393

The patch is about adding new object handler that is used when object is
serialized. Currently this needs to be done using get_properties which is
sometimes a bit over kill.

It's useful in situation if you only need to change properties for
serialization. It's very similar to get_debug_info that is used only in
specific situations (print_r...)

This seems too narrow a situation to warrant a new serialization method,
in addition to the two we already have.

That said, there are two precedents:

get_debug_info (variant for var_dump)
get_gc (variant for garbage collection)

And anyway, if you really want to add special properties for
serialization, you can use the same hack that was used before get_gc was
available:

http://lxr.php.net/xref/PHP_5_3/ext/spl/spl_observer.c#297
(and http://lxr.php.net/xref/PHP_5_3/ext/spl/spl_observer.c#258 )

You can check for BG(serialize_lock) instead of GC_G(gc_active)

What I think is that we should add an extra parameter to get_properties
instead of keeping adding these.

--
Gustavo Lopes

11 years ago by Jakub Zelenka — view source

unread

You can check for BG(serialize_lock) instead of GC_G(gc_active)

What if there was an extension that call php_var_serialize without locking?
I know there probably isn't any but still I don't think that this is a nice
way how to provide data for serialization.

What I think is that we should add an extra parameter to get_properties
instead of keeping adding these.

That would be nice. I will send a patch later and then we can discuss it...
;)

11 years ago by Gustavo Lopes — view source

unread

You can check for BG(serialize_lock) instead of GC_G(gc_active)

What if there was an extension that call php_var_serialize without locking?
I know there probably isn't any but still I don't think that this is a nice
way how to provide data for serialization.

I think not using PHP_VAR_SERIALIZE_INIT() would always be an error in
the extension anyway. However, you are right this method is not "nice"
by any means.

--
Gustavo Lopes