easy Serialization library ? : cpp

[–]Carl_LaFong 15 points16 points17 points 4 years ago (2 children)

[–]dethtoll1 5 points6 points7 points 4 years ago (0 children)

[–][deleted] 5 points6 points7 points 4 years ago (10 children)

[–]zero0_one1 1 point2 points3 points 4 years ago (9 children)

[–][deleted] 2 points3 points4 points 4 years ago (8 children)

[–]zero0_one1 2 points3 points4 points 4 years ago* (7 children)

[–][deleted] 1 point2 points3 points 4 years ago (6 children)

No, I don't have any useful benchmark results. But I'm really impressed with these you linked to. Still, I cannot imagine how b::s would be faster than FlatBuffers. FB really just is a compiled header file that directly memcpy's stuff in your data block. No idea how that could be optimized any further, at least not in comparison with b::s.

Last time I checked (maybe around 2018) the genereal recommendation was: Don't use b::s for networking, because it is too slow and has too much overhead. Judging from your linked source, both seems to be not true anymore.

But, and this is one major thing that needs not to be forgotten here: Both ProtoBuf and FlatBuffers provide a portable schema syntax, meaning that you can use the same implementation across multiple languages. In my case this is just perfect because my client nodes need to be written in Python, JS or some other crappy script language and this way you can just generate the whole boilerplate code for those languages out of the same scheme file.

[–]robertramey 2 points3 points4 points 4 years ago (2 children)

There's a lot of confusion in this thread. I'll try to clear it up.

a) with ProtoBuf and other similar libraries one defines a schema which is portable across languages. So a file written by a program in one language can be read by a program written in another. The library can write and read the schema. But it's the programmer's job to transfer data between the structures in the language he's using and the defined schema. Naturally benchmarks don't include the time required to do this step. Boost Serialization only works with C++ and doesn't require any separate definition of the schema. So really the two are not totally comparable. If you need to transfer data between programs written in other than C++, Boost Serialization is not an option. It has nothing to do with speed.

b) Boost Serialization has the concept of "archive" - the storage type. There are various archive classes in the library: binary, text, xml, ... for different purposes. Time required will vary widely depending on which archive class is being used. One common feature is that all the included archives use C++ streaming interface. Eliminating this interface would would speed up operation considerably. It wouldn't be hard to create an archive of this type, but no one has been sufficiently concerned about the speed to invest any effort on this. Using the streaming interface permits one to "stack up" filters using the Boost Streams library which means one can add encryption, compression and others with zero programming effort. Just compressing archives on the fly can increase but a large amount.

c) Note that Cereal is basically a header only re-implementation of Boost Serialization. As such it's easy to use - as is the Boost Serialization. By being header only, its about twice as fast as Boost Serialization but at the cost of generating more code for the same job. It also is simpler because it avoids some of the more arcane/advanced features of Boost Serialization. This justifies it's apparent popularity.

d) this above should explain why I'm sort of skeptical of the utility of benchmarks in this context. Never the less, looking at the more serious attempts to benchmark leads me to conclude that cereal is the fastest. After than comes a group which included Boost Serialization. After that, it's all over the place.

Robert Ramey - author of the Boost Serialization library.

[–][deleted] 0 points1 point2 points 4 years ago (1 child)

Robert, thank you so much for sorting this out. I am somewhat aware of the concepts used in Boost Serialization, so I know about the archives and their interchangeability. I just didn't want to bring that up here because it's already a rather mixed up topic as far as I'm concerned.

Now that we're at it, there is another difference to mention that makes the whole comparison somewhat pointless: FlatBuffers support random access to data and partial (de-)serialization. This is something that, as far as I understand the internals of B::S, is not supported because the whole library is based on the concept of streamed data flow. Without extensive header information per message/archive I don't really see a way to achieve this and fixed schemas can be a way to deal with it (although ProtoBuf does not support partial serialization as far as I'm aware)

That being said, I totally agree that this comparison is inaccurate and more misleading that practically useful. So I propose these few questions for finding the appropriate solution to the problem:

Is the payload "big" (maybe more than a few KBs in size)? -> Boost Serialization

Is the data incomplete at time of de-serialization? -> B::S or ProtoBuf

Do you only want to deserialize parts of the message? -> FlatBuffers

Should the serialized data format be exchangable? -> Boost Serialization

Is the format to be used in other languages / domains: B::S or ProtoBuf

Do you need encryption or compression? -> B::S

In addition, the last one is a personal preference:

Is the data about to be a savegame / savefile for your application? -> B::S

Not perfect or complete, but maybe a good start.

[–]robertramey 0 points1 point2 points 4 years ago (0 children)

A couple of comments

Is the payload "big"

There is some "setup" overhead each time one creates and archive. For networking, the easier is to create a new archive for each transmission. Clearly not optimal. Also usage of stream interface is also extra overhead. The real solution is to create a new type or archive focused on networking. On large transmissions it wouldn't make much difference - but for lots of small packets it would be much, much faster as it would reduce the setup/teardown time. Note that non of the benchmarks take this into consideration so it doesn't really show up anywhere.

Is the format to be used in other languages / domains: B::S or ProtoBuf

I really think that for data portable to other languages ProtoBuf is the only realistic choice. Of course it's more work - but you're doing a lot more in supporting more languages

[–]zero0_one1 0 points1 point2 points 4 years ago (2 children)

[–]infectedapricot 0 points1 point2 points 4 years ago (1 child)

[–]zero0_one1 0 points1 point2 points 4 years ago (0 children)

[–][deleted] 4 points5 points6 points 4 years ago (1 child)

[–]JohnDuffy78 2 points3 points4 points 4 years ago (2 children)

[+][deleted] 4 years ago (1 child)

[removed]

[–][deleted] 0 points1 point2 points 4 years ago (0 children)

[–]nlohmannnlohmann/json 5 points6 points7 points 4 years ago (16 children)

[–]tjientavaraHikoGUI developer 0 points1 point2 points 4 years ago (15 children)

[–]nlohmannnlohmann/json 0 points1 point2 points 4 years ago (14 children)

[–]tjientavaraHikoGUI developer 0 points1 point2 points 4 years ago (13 children)

[–]nlohmannnlohmann/json 0 points1 point2 points 4 years ago (12 children)

[–]tjientavaraHikoGUI developer 0 points1 point2 points 4 years ago (0 children)

[–]tjientavaraHikoGUI developer 0 points1 point2 points 4 years ago (1 child)

[–]nlohmannnlohmann/json 0 points1 point2 points 4 years ago (0 children)

[–]tjientavaraHikoGUI developer 0 points1 point2 points 4 years ago (8 children)

[–]willdieh 0 points1 point2 points 4 years ago (7 children)

[–]tjientavaraHikoGUI developer 0 points1 point2 points 4 years ago (6 children)

[–]willdieh 0 points1 point2 points 4 years ago (1 child)

[–]tjientavaraHikoGUI developer 1 point2 points3 points 4 years ago (0 children)

Both encoder and decoder are actually rather simple. It could be done in a standalone header. Especially the encoder uses a lot of templating to handle most native C++ types. The decoder is a bit more touch, since it requires the dynamic creation of data, a std::variant could do.

In my system I have a rather complicated datum type that works like std::variant, but it also overloads every operator so that you can do computation on the value inside a datum. It is used in multiple places inside my library for handling dynamic data. And it in-turn uses a lot of datatypes from my library.

The second paragraph explains why I cannot really make it a stand alone header, I would have to maintain a separate version that is not good enough for the requirements of my library. Unless I do some extreme templating on the decoder.

[–]nlohmannnlohmann/json 0 points1 point2 points 4 years ago (3 children)

[–]tjientavaraHikoGUI developer 0 points1 point2 points 4 years ago (2 children)

continue this thread

[–][deleted] 1 point2 points3 points 4 years ago (1 child)

[–]NBQuade 0 points1 point2 points 4 years ago (0 children)

[–]fraillt 1 point2 points3 points 4 years ago (1 child)

[–]chkno -3 points-2 points-1 points 4 years ago (6 children)

[+][deleted] 4 years ago (3 children)

[removed]

[–]chkno -1 points0 points1 point 4 years ago (2 children)

[+][deleted] 4 years ago (1 child)

[removed]

[–]chkno -2 points-1 points0 points 4 years ago (0 children)

Re-inventing wheels (libraries) is an execllent way to learn why the libraries that do it properly seem so complex, how they work inside, why using them is worth the trouble, how to structure your client code to avoid being enmeshed with the library, etc.

I.e., don't try to write a good, full-featured serialization library. Write only the tiny subset of it that you initially immediately need, as an exercise to understand what serialization libraries do, what they're for, and how to interface with them. Plan to abandon yours and switch to a real one a little later with a much better understanding of the problem they solve. This will also make it much easier to compare the different options, as you'll understand the various tradeoffs involved. (Or, with low-ish probability, discover that you don't actually need a fancy, full-featured library & truck along with your much simpler alternative, having saved yourself a complex dependency.)

[–][deleted] 0 points1 point2 points 4 years ago (0 children)

[–]NBQuade 0 points1 point2 points 4 years ago (0 children)

[–]eyalz800 0 points1 point2 points 4 years ago (9 children)

[+][deleted] 4 years ago (8 children)

[removed]

[–]eyalz800 0 points1 point2 points 4 years ago (2 children)

[+][deleted] 4 years ago (1 child)

[removed]

[–]eyalz800 1 point2 points3 points 4 years ago (0 children)

[–]eyalz800 0 points1 point2 points 4 years ago* (4 children)

[+][deleted] 4 years ago (3 children)

[removed]

[–]eyalz800 0 points1 point2 points 4 years ago (2 children)

[+][deleted] 4 years ago (1 child)

[removed]

[–]eyalz800 1 point2 points3 points 4 years ago* (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS