Back in 2014 there was an informal proposal on the mailing list to replace
PHP serialization with an efficient binary (msgpack) format.
internals@lists.php.net/msg69870.html" rel="nofollow" target="_blank">https://www.mail-archive.com/internals@lists.php.net/msg69870.html
As argued back then, breaking serialize()
and unserialize()
is unacceptable
for many reasons.
Meanwhile, the default session-storage, third-party cache libraries, etc.
continue to use serialize()
and unserialize()
with better options being
available only outside of the standard PHP run-times.
Why don't we either:
-
adopt msg_pack() and msg_unpack() from the PECL package as standard
functions, or -
add a new pair of functions, e.g. bin_serialize() and bin_unserialize()
to supersede the existing functions
Or possibly both - that is, alias the pack/unpack functions with names
indicating they supersede the old un/serialize functions, but document the
new bin_* functions are using "an unspecified internal format". This way,
people can elect to use msgpack binary format now and in the future, or
elect to use msgpack now and possibly a different format in the future.
Optionally the bin_* functions could lead with a version byte (or maybe a
4-byte header) allowing backwards compatibility with future binary formats.
This way we don't risk ending up in the same situation again in 10 years if
we realize msgpack is bad for serialization for any reason.
There are many other uses for a set of efficient (space and time)
un/serialization functions, for example when packing small payloads
(checksums, image dimensions, etc.) into URLs, when persisting data in
cookies, etc.
Thoughts?
-----Oorspronkelijk bericht-----
Van: Rasmus Schultz [mailto:rasmus@mindplay.dk]
Verzonden: woensdag 28 juni 2017 11:04
Aan: PHP internals
Onderwerp: [PHP-DEV] Binary (msgpack) serialize/unserialize
- adopt msg_pack() and msg_unpack() from the PECL package as standard
functions, or
Hi Rasmus,
We use msgpack for serialization on a high-traffic site and while I stand by our choice for the format, the implementation in the PECL module in my opinion would need to be improved if shipped as part of core PHP.
We've seen (and are seeing) our fair share of issues with it. I managed to fix some with my less-than-adequate skills in C and the maintainers have definitely improved the stability recently. But we are for example still hitting some corruption issue with array references on a daily basis. I haven't been able to narrow it down or reliably reproduce. In general I suspect fuzzing the module might yield a bunch of interesting results.
Please don't interpret this as a rant on the maintainers effort or skill though, they have way more of both than I do. I'm not a contributor to the PHP internals, I only subscribe to this list to stay up-to-date, but as a heavy user of msgpack in production I felt chiming in with a word of caution would be appropriate given our experience.
I definitely like the idea of PHP standardizing on msgpack as its binary serialization format for the same reasons we've decided to use it. (And it'd most likely lead to significant improvement in the reliability of the implementation, which is of course in my own interest. ;) )
Marlies
Back in 2014 there was an informal proposal on the mailing list to replace
PHP serialization with an efficient binary (msgpack) format.internals@lists.php.net/msg69870.html" rel="nofollow" target="_blank">https://www.mail-archive.com/internals@lists.php.net/msg69870.html
As argued back then, breaking
serialize()
andunserialize()
is unacceptable
for many reasons.Meanwhile, the default session-storage, third-party cache libraries, etc.
continue to useserialize()
andunserialize()
with better options being
available only outside of the standard PHP run-times.Why don't we either:
adopt msg_pack() and msg_unpack() from the PECL package as standard
functions, oradd a new pair of functions, e.g. bin_serialize() and bin_unserialize()
to supersede the existing functionsOr possibly both - that is, alias the pack/unpack functions with names
indicating they supersede the old un/serialize functions, but document the
new bin_* functions are using "an unspecified internal format". This way,
people can elect to use msgpack binary format now and in the future, or
elect to use msgpack now and possibly a different format in the future.Optionally the bin_* functions could lead with a version byte (or maybe a
4-byte header) allowing backwards compatibility with future binary formats.
This way we don't risk ending up in the same situation again in 10 years if
we realize msgpack is bad for serialization for any reason.There are many other uses for a set of efficient (space and time)
un/serialization functions, for example when packing small payloads
(checksums, image dimensions, etc.) into URLs, when persisting data in
cookies, etc.Thoughts?
See also http://news.php.net/php.internals/98834.
--
Christoph M. Becker
2017-06-28 11:04 GMT+02:00 Rasmus Schultz rasmus@mindplay.dk:
Back in 2014 there was an informal proposal on the mailing list to replace
PHP serialization with an efficient binary (msgpack) format.internals@lists.php.net/msg69870.html" rel="nofollow" target="_blank">https://www.mail-archive.com/internals@lists.php.net/msg69870.html
As argued back then, breaking
serialize()
andunserialize()
is unacceptable
for many reasons.Meanwhile, the default session-storage, third-party cache libraries, etc.
continue to useserialize()
andunserialize()
with better options being
available only outside of the standard PHP run-times.Why don't we either:
adopt msg_pack() and msg_unpack() from the PECL package as standard
functions, oradd a new pair of functions, e.g. bin_serialize() and bin_unserialize()
to supersede the existing functionsOr possibly both - that is, alias the pack/unpack functions with names
indicating they supersede the old un/serialize functions, but document the
new bin_* functions are using "an unspecified internal format".
Uhm, why unspecified?
This way,
people can elect to use msgpack binary format now and in the future, or
elect to use msgpack now and possibly a different format in the future.Optionally the bin_* functions could lead with a version byte (or maybe a
4-byte header) allowing backwards compatibility with future binary formats.
This way we don't risk ending up in the same situation again in 10 years if
we realize msgpack is bad for serialization for any reason.There are many other uses for a set of efficient (space and time)
un/serialization functions, for example when packing small payloads
(checksums, image dimensions, etc.) into URLs, when persisting data in
cookies, etc.Thoughts?
You could also just gzip larger payloads, I guess that should give a good
reduction in size.
Regards, Niklas
Hi!
Back in 2014 there was an informal proposal on the mailing list to replace
PHP serialization with an efficient binary (msgpack) format.
Why replace? If you have it as an extension, can't people that want to
use it just use it?
I'm still not sure why we need to do anything in core that can't be done
in an extension. Is there some handler missing that doesn't allow to use
custom serializer in some scenario?
Stas Malyshev
smalyshev@gmail.com
2017-06-29 2:13 GMT+03:00 Stanislav Malyshev smalyshev@gmail.com:
Hi!
Back in 2014 there was an informal proposal on the mailing list to replace
PHP serialization with an efficient binary (msgpack) format.Why replace? If you have it as an extension, can't people that want to
use it just use it?I'm still not sure why we need to do anything in core that can't be done
in an extension. Is there some handler missing that doesn't allow to use
custom serializer in some scenario?
PHP Session, Memcache, Redis, etc - not configuring custom serializer.
Hi!
PHP Session,
http://php.net/manual/en/session.configuration.php#ini.session.serialize-handler
?
Memcache, Redis, etc - not configuring custom serializer.
Not sure about those, but don't see a problem adding custom handler.
--
Stas Malyshev
smalyshev@gmail.com