Hi,
Completely different topic :)
I've been looking a bit into performance around json encoding, hashing+encryption (aes) and serialize()
/unserialize(). Data that is marshaled and often transmitted over the wire.
I know there have been some high-end apps that have benefited from some custom serializers, etc... (typically platform dependent).
I wonder if people here think improvements in these areas would move the needle for the majority of mainstream apps or not.
Thanks,
Andi
I think there is much to gain by improving the serialization speed in
PHP. It is used everywhere from caches like memcache, to sessions or
manual data input into DB. I would say that there are very few
non-trivial apps that would not benefit from a more compact and faster
serializer.
In our specific work-use-case switching to igbinary improved the speed
of the overall page generation by 2-3%.
Hi,
Completely different topic :)
I've been looking a bit into performance around json encoding, hashing+encryption (aes) and
serialize()
/unserialize(). Data that is marshaled and often transmitted over the wire.I know there have been some high-end apps that have benefited from some custom serializers, etc... (typically platform dependent).
I wonder if people here think improvements in these areas would move the needle for the majority of mainstream apps or not.Thanks,
Andi
Hi,
I think it would, a lot of sites/apps nowadays rely a lot on JSON encoding/decoding, plus a lot of technologies are relying on serialization/json encoding (Memcached, Redis, to name a few) at the PHP level, which can be a really big performance eater if you use it a lot.
On the other hand, it's not too difficult to get IGBinary setup as serializer instead of the default one, and it does some pretty good improvement.
Ever considered making it a part of PHP? What would be the implication (as I'm not too familiar with all the internals of PHP besides extensions...)?
Michel Bartz
Lead Developer
Manwin Canada
Skype: michel.php
-----Original Message-----
From: Andi Gutmans [mailto:andi@zend.com]
Sent: Thursday, November 25, 2010 12:47 PM
To: internals@lists.php.net
Subject: [PHP-DEV] Performance of buffer based functionality (JSON, AES, serialize()
)
Hi,
Completely different topic :)
I've been looking a bit into performance around json encoding, hashing+encryption (aes) and serialize()
/unserialize(). Data that is marshaled and often transmitted over the wire.
I know there have been some high-end apps that have benefited from some custom serializers, etc... (typically platform dependent).
I wonder if people here think improvements in these areas would move the needle for the majority of mainstream apps or not.
Thanks,
Andi
This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately.
Ce courrier ?lectronique est confidentiel et prot?g?. L'exp?diteur ne renonce pas aux droits et obligations qui s'y rapportent. Toute diffusion, utilisation ou copie de ce message ou des renseignements qu'il contient par une personne autre que le (les) destinataire(s) d?sign?(s) est interdite. Si vous recevez ce courrier ?lectronique par erreur, veuillez m'en aviser imm?diatement, par retour de courrier ?lectronique ou par un autre moyen.
hi,
For the record here, igbinary is a very good example of such optimization:
http://opensource.dynamoid.com/
Hi,
Completely different topic :)
I've been looking a bit into performance around json encoding, hashing+encryption (aes) and
serialize()
/unserialize(). Data that is marshaled and often transmitted over the wire.I know there have been some high-end apps that have benefited from some custom serializers, etc... (typically platform dependent).
I wonder if people here think improvements in these areas would move the needle for the majority of mainstream apps or not.Thanks,
Andi
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
For the record here, igbinary is a very good example of such optimization:
igbinary is a nice extension indeed. However, for those of us who have
environments which include multiple programming languages, custom
serializations become a PITA. As such, we generally go with something more
portable such as Avro or straight JSON. Awhile back, I had done some work
rewriting the JSON serialization functions to use the fast (and BSD
licensed) yajl JSON parser (https://github.com/lloyd/yajl). Initial
benchmarks showed a 4-7% performance improvement in
serialization/deserialization.
I'll see if I can dig it up--hopefully it's not on my dead computer.
--
Jonah H. Harris
Blog: http://www.oracle-internals.com/
For the record here, igbinary is a very good example of such optimization:
igbinary is a nice extension indeed. However, for those of us who have
environments which include multiple programming languages, custom
serializations become a PITA. As such, we generally go with something more
portable such as Avro or straight JSON. Awhile back, I had done some work
rewriting the JSON serialization functions to use the fast (and BSD
licensed) yajl JSON parser (https://github.com/lloyd/yajl). Initial
benchmarks showed a 4-7% performance improvement in
serialization/deserialization.
Good point indeed. That makes me think about bson
(http://bsonspec.org/), which is used by mongodb for example.
--
Pierre
@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
Just read over the BSON spec, looks fairly interesting, the only bit
that appears to be missing for PHP purposes is object support. We
would need to introduce custom type on top of standard BSON. However
from compactness and consistency standpoint it looks fairly appealing.
For the record here, igbinary is a very good example of such optimization:
igbinary is a nice extension indeed. However, for those of us who have
environments which include multiple programming languages, custom
serializations become a PITA. As such, we generally go with something more
portable such as Avro or straight JSON. Awhile back, I had done some work
rewriting the JSON serialization functions to use the fast (and BSD
licensed) yajl JSON parser (https://github.com/lloyd/yajl). Initial
benchmarks showed a 4-7% performance improvement in
serialization/deserialization.Good point indeed. That makes me think about bson
(http://bsonspec.org/), which is used by mongodb for example.--
Pierre@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
It's nice but as long as the browsers don't implement it natively then it's less useful for server to client communication.
Of course can still be quite useful with custom I/O or data sources that implement it natively i.e. mongodb.
-----Original Message-----
From: Ilia Alshanetsky [mailto:ilia@prohost.org]
Sent: Thursday, November 25, 2010 4:16 PM
To: Pierre Joye
Cc: Jonah H. Harris; Andi Gutmans; internals@lists.php.net
Subject: Re: [PHP-DEV] Performance of buffer based functionality (JSON, AES,
serialize()
)Just read over the BSON spec, looks fairly interesting, the only bit that appears
to be missing for PHP purposes is object support. We would need to introduce
custom type on top of standard BSON. However from compactness and
consistency standpoint it looks fairly appealing.On Thu, Nov 25, 2010 at 8:47 PM, Jonah H. Harris jonah.harris@gmail.com
wrote:On Thu, Nov 25, 2010 at 2:14 PM, Pierre Joye pierre.php@gmail.com
wrote:For the record here, igbinary is a very good example of such optimization:
igbinary is a nice extension indeed. However, for those of us who
have environments which include multiple programming languages,
custom serializations become a PITA. As such, we generally go with
something more portable such as Avro or straight JSON. Awhile back,
I had done some work rewriting the JSON serialization functions to
use the fast (and BSD
licensed) yajl JSON parser (https://github.com/lloyd/yajl). Initial
benchmarks showed a 4-7% performance improvement in
serialization/deserialization.Good point indeed. That makes me think about bson
(http://bsonspec.org/), which is used by mongodb for example.--
Pierre@pierrejoye | http://blog.thepimp.net | http://www.libgd.org
--
To unsubscribe,
visit: http://www.php.net/unsub.php
I know there have been some high-end apps that have benefited from
some custom serializers, etc... (typically platform dependent).
I wonder if people here think improvements in these areas would move
the needle for the majority of mainstream apps or not.
Like people have mentioned, improving (un)serialize speed would be a huge
benefit, especially for caching data sets or large objects.
From experience, it would seem valuable to have:
- serialize_text($var)
The existing serialize()
minus the NULL
bytes on private properties. It has
been a source problems for developers serializing an object with private
properties and storing it in a database (the string may get cutoff).
I'm not sure why there's a NULL
byte in 'zend_mangle_property_name', instead
the char "" could be used to mark a private property in the serialized
text.
The unserialize could be BC compatible accepting both NULL
and "" around a
private property.
- serialize_binary($var)
An efficient and compact serialization using techniques from igbinary.
A potential problem with igbinary I've noticed is it packs a double as a 64
bit integer.
That could be a problem if you serialize on a platform that has an IEEE 754
binary representation and unserialize on a non-IEEE platform but I don't
know if php compiles on architectures that are non-IEEE.
It could also be interesting to pack integers as varints:
http://code.google.com/apis/protocolbuffers/docs/encoding.html#varints
http://protobuf-c.googlecode.com/svn/trunk/src/google/protobuf-c/protobuf-c.
c
That's most likely slower though then what igbinary does with integers
I guess serialize mechanism cant use any char that can be part of a
PHP variable. And "_" can. As property names respect binary
compatibility, the only char that can be used to mark private
properties is actually the NULL
byte. Ping me if I'm wrong.
But I'm +1 for improving the serialize()
speed, I had problems
recently with it, and igbinary came to save me as well :)
Julien.Pauli
I know there have been some high-end apps that have benefited from
some custom serializers, etc... (typically platform dependent).
I wonder if people here think improvements in these areas would move
the needle for the majority of mainstream apps or not.Like people have mentioned, improving (un)serialize speed would be a huge
benefit, especially for caching data sets or large objects.From experience, it would seem valuable to have:
- serialize_text($var)
The existing
serialize()
minus theNULL
bytes on private properties. It has
been a source problems for developers serializing an object with private
properties and storing it in a database (the string may get cutoff).I'm not sure why there's a
NULL
byte in 'zend_mangle_property_name', instead
the char "" could be used to mark a private property in the serialized
text.
The unserialize could be BC compatible accepting bothNULL
and "" around a
private property.
- serialize_binary($var)
An efficient and compact serialization using techniques from igbinary.
A potential problem with igbinary I've noticed is it packs a double as a 64
bit integer.
That could be a problem if you serialize on a platform that has an IEEE 754
binary representation and unserialize on a non-IEEE platform but I don't
know if php compiles on architectures that are non-IEEE.It could also be interesting to pack integers as varints:
http://code.google.com/apis/protocolbuffers/docs/encoding.html#varints
http://protobuf-c.googlecode.com/svn/trunk/src/google/protobuf-c/protobuf-c.
cThat's most likely slower though then what igbinary does with integers
I guess serialize mechanism cant use any char that can be part of a
PHP variable. And "_" can. As property names respect binary
compatibility, the only char that can be used to mark private
properties is actually theNULL
byte. Ping me if I'm wrong.
Right, what I was proposing didn't make sense. After digging through the source, say we have:
class Foo {
public $a = 1;
protected $b = 2;
private $c = 3;
}
Currently this is:
O:3:"Foo":3:{s:1:"a";i:1;s:4:"�*�b";i:2;s:6:"�Foo�c";i:3;}
An alternative could be:
O:3:"Foo":3:{s:1:"a";i:1;*;s:4:"b";i:2;_;s:6:"c";i:3;}
Where "*;" is a marker for protected, "_;" is a marker for private
It would involve some trickery in ext/standard/var_unserializer.re :
";" {
/ prepend �� to the next key so that we have zend_symtable_find("��b") */
}
"_;" {
/* prepend �Foo� to the next key so that we have zend_symtable_find("�Foo�c") */
}
Just a thought if someone wants to refactor it / look into performance, I believe that approach would support both:
O:3:"Foo":3:{s:1:"a";i:1;;s:4:"b";i:2;_;s:6:"c";i:3;}
O:3:"Foo":3:{s:1:"a";i:1;s:4:"��b";i:2;s:6:"�Foo�c";i:3;}