PHP serialization is slowest in PHP Session, clients NoSQL, ...
I would like to have in PHP 7, a new serialization algorithm or custom
handler to serialize.
My opinion is that the best choice is to use msgpack, it is
+110% faster
-30% data size
HHVM discussed this issue, but all boils down to backward compatibility with PHP
https://github.com/facebook/hhvm/issues/2654
What do you think about this, maybe it's time to change the old
algorithm serialization, on something better?
PHP serialization is slowest in PHP Session, clients NoSQL, ...
I would like to have in PHP 7, a new serialization algorithm or custom
handler to serialize.
The latter is already possible and there are many good extensions doing
that. Igbinary f.e.
My opinion is that the best choice is to use msgpack, it is
+110% faster
-30% data sizeHHVM discussed this issue, but all boils down to backward compatibility
with PHP
https://github.com/facebook/hhvm/issues/2654What do you think about this, maybe it's time to change the old
algorithm serialization, on something better?
Hi <person hiding behind a project>,
Backwards compatibility is one hurdle, but if you wipe all your serialised
data then begin to re-serialise using the new approach then you're fine.
As for what to use msgpack or igbinary, well there's already good support
for igbinary in PHP thanks to Pierre and others. You should benchmark
igbinary vs msgpack and come back with your findings if we're to evaluate
alternative serialization libraries.
Many thanks,
Paul
On Tue, Sep 23, 2014 at 12:23 AM, Park Framework park.framework@gmail.com
wrote:
PHP serialization is slowest in PHP Session, clients NoSQL, ...
I would like to have in PHP 7, a new serialization algorithm or custom
handler to serialize.My opinion is that the best choice is to use msgpack, it is
+110% faster
-30% data sizeHHVM discussed this issue, but all boils down to backward compatibility
with PHP
https://github.com/facebook/hhvm/issues/2654What do you think about this, maybe it's time to change the old
algorithm serialization, on something better?
Performance testing, Msgpack VS Igbinary
igbinary: -20% slower, data size ~5%
Advantage Msgpack, he works fast, and this format understood by many
technologies - Java, Python, Lua in Redis.
2014-09-23 12:20 GMT+03:00 Paul Dragoonis dragoonis@gmail.com:
Hi <person hiding behind a project>,
Backwards compatibility is one hurdle, but if you wipe all your serialised
data then begin to re-serialise using the new approach then you're fine.As for what to use msgpack or igbinary, well there's already good support
for igbinary in PHP thanks to Pierre and others. You should benchmark
igbinary vs msgpack and come back with your findings if we're to evaluate
alternative serialization libraries.Many thanks,
PaulOn Tue, Sep 23, 2014 at 12:23 AM, Park Framework park.framework@gmail.com
wrote:PHP serialization is slowest in PHP Session, clients NoSQL, ...
I would like to have in PHP 7, a new serialization algorithm or custom
handler to serialize.My opinion is that the best choice is to use msgpack, it is
+110% faster
-30% data sizeHHVM discussed this issue, but all boils down to backward compatibility
with PHP
https://github.com/facebook/hhvm/issues/2654What do you think about this, maybe it's time to change the old
algorithm serialization, on something better?
Write an extension for it then, also share your benchmarks :)
On Tue, Sep 23, 2014 at 12:17 PM, Park Framework park.framework@gmail.com
wrote:
Performance testing, Msgpack VS Igbinary
igbinary: -20% slower, data size ~5%
Advantage Msgpack, he works fast, and this format understood by many
technologies - Java, Python, Lua in Redis.2014-09-23 12:20 GMT+03:00 Paul Dragoonis dragoonis@gmail.com:
Hi <person hiding behind a project>,
Backwards compatibility is one hurdle, but if you wipe all your
serialised
data then begin to re-serialise using the new approach then you're fine.As for what to use msgpack or igbinary, well there's already good support
for igbinary in PHP thanks to Pierre and others. You should benchmark
igbinary vs msgpack and come back with your findings if we're to evaluate
alternative serialization libraries.Many thanks,
PaulOn Tue, Sep 23, 2014 at 12:23 AM, Park Framework <
park.framework@gmail.com>
wrote:PHP serialization is slowest in PHP Session, clients NoSQL, ...
I would like to have in PHP 7, a new serialization algorithm or custom
handler to serialize.My opinion is that the best choice is to use msgpack, it is
+110% faster
-30% data sizeHHVM discussed this issue, but all boils down to backward compatibility
with PHP
https://github.com/facebook/hhvm/issues/2654What do you think about this, maybe it's time to change the old
algorithm serialization, on something better?
Write an extension for it then, also share your benchmarks :)
Why go to all that trouble, 10 seconds on Google and we have:
http://pecl.php.net/package/msgpack
Write an extension for it then, also share your benchmarks :)
Why go to all that trouble, 10 seconds on Google and we have:
https://github.com/msgpack/msgpack-php
--
--
Pierre
@pierrejoye | http://www.libgd.org
I clearly didn't google, it would be interesting to see comparisons of high
speed PHP serialization libraries. I for one would be happy, in PHP 7, to
break BC serialization syntax in favour of putting in a much faster
serializer by default. Similar scenario to putting in Zend OpCache by
default instead of APC.
Pierre, do you see merit on including <insert best overall serializer lib here> by default in PHP7 ?
http://pecl.php.net/package/msgpack
Write an extension for it then, also share your benchmarks :)
Why go to all that trouble, 10 seconds on Google and we have:
https://github.com/msgpack/msgpack-php
--
--
Pierre@pierrejoye | http://www.libgd.org
I clearly didn't google, it would be interesting to see comparisons of
high speed PHP serialization libraries. I for one would be happy, in PHP 7,
to break BC serialization syntax in favour of putting in a much faster
serializer by default. Similar scenario to putting in Zend OpCache by
default instead of APC.Pierre, do you see merit on including <insert best overall serializer lib here> by default in PHP7 ?
Not really, not because it is not good but because there is always be a
better one. We can't break format in every release.
Not really, not because it is not good but because there is always be a
better one. We can't break format in every release.
If you do not update in PHP 7 serialization method, it will never be
updated, the default serialization in PHP 7 will be slow.
To maintain backward compatibility, can implement support method calls
on primitive types, new algorithms for serialization to be called only
in the new API.
$var->serialize()
$var->unserialize()
What do you think about this?
On Tue, Sep 23, 2014 at 7:36 PM, Park Framework
park.framework@gmail.com wrote:
Not really, not because it is not good but because there is always be a
better one. We can't break format in every release.If you do not update in PHP 7 serialization method, it will never be
updated, the default serialization in PHP 7 will be slow.To maintain backward compatibility, can implement support method calls
on primitive types, new algorithms for serialization to be called only
in the new API.$var->serialize()
$var->unserialize()What do you think about this?
Not changing the serialize()
format doesn't mean that other formats
can't be introduced via extensions. Though, I too would like to have
more of them available by default.
Cheers,
Andrey.
If you do not update in PHP 7 serialization method, it will never be
updated, the default serialization in PHP 7 will be slow.To maintain backward compatibility, can implement support method calls
on primitive types, new algorithms for serialization to be called only
in the new API.$var->serialize()
$var->unserialize()What do you think about this?
To maintain backward compatibility we leave the behaviour of
un/serialize() in tact.
There are extensions for alternative serialisation methods and
regardless of whether any of these becomes a bundled extension, the
functionality should exist under a different set of function names.
On 23 September 2014 17:36, Park Framework park.framework@gmail.com
wrote:If you do not update in PHP 7 serialization method, it will never be
updated, the default serialization in PHP 7 will be slow.To maintain backward compatibility, can implement support method calls
on primitive types, new algorithms for serialization to be called only
in the new API.$var->serialize()
$var->unserialize()What do you think about this?
To maintain backward compatibility we leave the behaviour of
un/serialize() in tact.There are extensions for alternative serialisation methods and
regardless of whether any of these becomes a bundled extension, the
functionality should exist under a different set of function names.
They all exist already. Either for automatic serialization or using
<extname>_(un)serialize, or pack/unpack.
Also as it has been said already, many different extensions exist, each of
them being good in one or many areas. It is like compression methods, use
the right one for the right task.
Hi!
I clearly didn't google, it would be interesting to see comparisons of high
speed PHP serialization libraries. I for one would be happy, in PHP 7, to
break BC serialization syntax in favour of putting in a much faster
serializer by default. Similar scenario to putting in Zend OpCache by
default instead of APC.
Why break anything? If you need faster serializer, it's quite easy to
get one, including msgpack. If it is really an issue that is important
for people, we could include the package into core. But I don't see
breaking BC in serialize/unserialize as a big win here. If it's really a
bottleneck, a userspace package abstracting the specific serializer
function could be easily created - and most clients like sessions
already allow to switch serializers by configs. So BC break does not
seem to be warranted here.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Why break anything? If you need faster serializer, it's quite easy to
get one, including msgpack. If it is really an issue that is important
for people, we could include the package into core. But I don't see
breaking BC in serialize/unserialize as a big win here. If it's really a
bottleneck, a userspace package abstracting the specific serializer
function could be easily created - and most clients like sessions
already allow to switch serializers by configs. So BC break does not
seem to be warranted here.
Perhaps a compromise would be to choose the quickest method of
serialization, add it to PHP core.
In php.ini add the directive
serialization.method = msgpack / Igbinary / ....
Hi!
Perhaps a compromise would be to choose the quickest method of
serialization, add it to PHP core.In php.ini add the directive
serialization.method = msgpack / Igbinary / ....
We could, but what if you need to read/write data specifically from
current PHP serializer? You'd have to mess then with runtime directives,
it doesn't look like a good design. That's like having one db_query()
function for all databases and have a config parameter that switches the
global database type. I think the other option is better - to have
extensions for all underlying functions and abstraction layer (PDO or
userspace) that provides unified API if needed.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
I agree, you're right.
My desire to override the existing algorithm serialize()
, due to the
need to change the method serialization, but does not change the
source code (legacy code, ext PHP)
2014-09-24 3:03 GMT+03:00 Stas Malyshev smalyshev@sugarcrm.com:
Hi!
Perhaps a compromise would be to choose the quickest method of
serialization, add it to PHP core.In php.ini add the directive
serialization.method = msgpack / Igbinary / ....We could, but what if you need to read/write data specifically from
current PHP serializer? You'd have to mess then with runtime directives,
it doesn't look like a good design. That's like having one db_query()
function for all databases and have a config parameter that switches the
global database type. I think the other option is better - to have
extensions for all underlying functions and abstraction layer (PDO or
userspace) that provides unified API if needed.Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Hi
2014-09-23 23:56 GMT+02:00 Park Framework park.framework@gmail.com:
In php.ini add the directive
serialization.method = msgpack / Igbinary / ....
There is an even better way to do this; add an additional parameter to
serialize and unserialize to serialize as and unserialize as:
$bin = serialize($data_struct, 'igbinary');
$data_struct = unserialize($bin, 'igbinary');
This keeps a clean BC solution without adding more clutter to the ini
file for runtime things, you could say that adding something like
'serialize_default_mode = php;', but changing its default would create
clutter code where the optional second parameter would be forced, so
lets leave the ini out of this.
So what I propose here is:
- An internal API to register serializes,and making PHP's current
serialize implemented as such - Add a second parameter to both serialize and unserialize that can
be used choose a serializer - Optionally add a function like: get_serialize_handlers() (so we
won't have to parsephpinfo()
) which are the available serializers - Consider bundling either igbinary, msgpack or implement a new
custom and more efficient one that allows us to be future oriented - No php.ini changes
- Allow users to register serialize handlers using
register_serialize_handler()/unregister_serialize_handler() - Optionally consider implementing this in SPL
Okay, late night, back to sleep!
--
regards,
Kalle Sommer Nielsen
kalle@php.net
Hi!
There is an even better way to do this; add an additional parameter to
serialize and unserialize to serialize as and unserialize as:$bin = serialize($data_struct, 'igbinary');
$data_struct = unserialize($bin, 'igbinary');
This is cleaner, but if you can do this (code change), why you can't do
just igbinary_serialize($data_struct)?
- Optionally add a function like: get_serialize_handlers() (so we
won't have to parsephpinfo()
) which are the available serializers
That actually makes a lot of sense, but serialize_get_handlers() might
be a better name, to group them together. But right now I don't think we
have such list, do we? We have php_session_register_serializer() and the
list for sessions, but not for other contexts.
- Allow users to register serialize handlers using
register_serialize_handler()/unregister_serialize_handler()
Do you think userspace serialize handlers would be popular? They would
be by necessity pretty slow compared to C ones.
Stanislav Malyshev, Software Architect
SugarCRM: http://www.sugarcrm.com/
Performance testing, Msgpack VS Igbinary
igbinary: -20% slower, data size ~5%
As with any benchmark, the details of the test are rather important.
Firstly, some data structures may be better handled than others, or be targeted for extra optimization, making some scenarios favour one method or the other. Some care needs to be taken in simulating one or more realistic use-cases.
Secondly, speed to serialize and speed to unserialize are separate measures: igbinary openly admits that it is best used for things like caching, where reading occurs more often than writing, as it is often slower than text-based methods at write-time, but faster at read-time.
Thirdly, the algorithms may have optional features which trade speed for space, or affect the above two points. For instance, igbinary's string interning, or the choice of structure used for objects in a PHP msgpack implementation.
All that taken into account, it's unlikely that any one format is better in all situations, and in some cases the existing text-based format may even have measurable advantages. Which points again to the idea of making more algorithms available as bundled extensions, and as session serialization methods, but not changing the meaning of existing functions.
--
Rowan Collins
[IMSoP]
Park Framework wrote (on 23/09/2014):
PHP serialization is slowest in PHP Session, clients NoSQL, ...
I would like to have in PHP 7, a new serialization algorithm or custom
handler to serialize.My opinion is that the best choice is to use msgpack, it is
+110% faster
-30% data sizeHHVM discussed this issue, but all boils down to backward compatibility with PHP
https://github.com/facebook/hhvm/issues/2654What do you think about this, maybe it's time to change the old
algorithm serialization, on something better?
Apart from the BC implications, using a binary serialization by default
might cause issues with anyone who is storing or passing the serialized
data somewhere which is not binary-safe. Admittedly, any object with
private properties generates a serialized form with null bytes, but many
values will consist entirely of ASCII characters, and some code may rely
on this being the case.
The format is also widely known, and has been implemented in other
languages for compatibility (although it is not suitable for
untrusted data exchange, as Anthony Ferrara tweeted a few months ago:
https://twitter.com/ircmaxell/status/452182852562862080)
We already have pluggable serializers for sessions (needed because the
serialization happens implicitly in the session handling code), and can
add as many functions for types of serialization as seem sensible, so
I'm not sure what the benefit of changing serialize()
/unserialize()
themselves is.
Changing the default session serialization might be worth considering,
though, along with bundling something like igbinary or msgpack.
Oh, and a non-batshit version of session_decode()
for manually invoking
session (un)serialization handlers :P
--
Rowan Collins
[IMSoP]