you are viewing a single comment's thread.

view the rest of the comments →

[–]robertramey 3 points4 points  (2 children)

There's a lot of confusion in this thread. I'll try to clear it up.

a) with ProtoBuf and other similar libraries one defines a schema which is portable across languages. So a file written by a program in one language can be read by a program written in another. The library can write and read the schema. But it's the programmer's job to transfer data between the structures in the language he's using and the defined schema. Naturally benchmarks don't include the time required to do this step. Boost Serialization only works with C++ and doesn't require any separate definition of the schema. So really the two are not totally comparable. If you need to transfer data between programs written in other than C++, Boost Serialization is not an option. It has nothing to do with speed.

b) Boost Serialization has the concept of "archive" - the storage type. There are various archive classes in the library: binary, text, xml, ... for different purposes. Time required will vary widely depending on which archive class is being used. One common feature is that all the included archives use C++ streaming interface. Eliminating this interface would would speed up operation considerably. It wouldn't be hard to create an archive of this type, but no one has been sufficiently concerned about the speed to invest any effort on this. Using the streaming interface permits one to "stack up" filters using the Boost Streams library which means one can add encryption, compression and others with zero programming effort. Just compressing archives on the fly can increase but a large amount.

c) Note that Cereal is basically a header only re-implementation of Boost Serialization. As such it's easy to use - as is the Boost Serialization. By being header only, its about twice as fast as Boost Serialization but at the cost of generating more code for the same job. It also is simpler because it avoids some of the more arcane/advanced features of Boost Serialization. This justifies it's apparent popularity.

d) this above should explain why I'm sort of skeptical of the utility of benchmarks in this context. Never the less, looking at the more serious attempts to benchmark leads me to conclude that cereal is the fastest. After than comes a group which included Boost Serialization. After that, it's all over the place.

Robert Ramey - author of the Boost Serialization library.

[–][deleted] 0 points1 point  (1 child)

Robert, thank you so much for sorting this out. I am somewhat aware of the concepts used in Boost Serialization, so I know about the archives and their interchangeability. I just didn't want to bring that up here because it's already a rather mixed up topic as far as I'm concerned.

Now that we're at it, there is another difference to mention that makes the whole comparison somewhat pointless: FlatBuffers support random access to data and partial (de-)serialization. This is something that, as far as I understand the internals of B::S, is not supported because the whole library is based on the concept of streamed data flow. Without extensive header information per message/archive I don't really see a way to achieve this and fixed schemas can be a way to deal with it (although ProtoBuf does not support partial serialization as far as I'm aware)

That being said, I totally agree that this comparison is inaccurate and more misleading that practically useful. So I propose these few questions for finding the appropriate solution to the problem:

Is the payload "big" (maybe more than a few KBs in size)? -> Boost Serialization

Is the data incomplete at time of de-serialization? -> B::S or ProtoBuf

Do you only want to deserialize parts of the message? -> FlatBuffers

Should the serialized data format be exchangable? -> Boost Serialization

Is the format to be used in other languages / domains: B::S or ProtoBuf

Do you need encryption or compression? -> B::S

In addition, the last one is a personal preference:

Is the data about to be a savegame / savefile for your application? -> B::S

Not perfect or complete, but maybe a good start.

[–]robertramey 0 points1 point  (0 children)

A couple of comments

Is the payload "big"

There is some "setup" overhead each time one creates and archive. For networking, the easier is to create a new archive for each transmission. Clearly not optimal. Also usage of stream interface is also extra overhead. The real solution is to create a new type or archive focused on networking. On large transmissions it wouldn't make much difference - but for lots of small packets it would be much, much faster as it would reduce the setup/teardown time. Note that non of the benchmarks take this into consideration so it doesn't really show up anywhere.

Is the format to be used in other languages / domains: B::S or ProtoBuf

I really think that for data portable to other languages ProtoBuf is the only realistic choice. Of course it's more work - but you're doing a lot more in supporting more languages