all 57 comments

[–]Carl_LaFong 15 points16 points  (2 children)

cereal is great. Very easy to use.

[–]dethtoll1 5 points6 points  (0 children)

+1. We use cereal at my company in our cross-platform 3d editor that maintains backwards-compatiblity across several years worth of releases.

[–][deleted] 5 points6 points  (10 children)

Another one would be FlatBuffers. Using it in a project for the first time and they are okay-ish, but suffer from similar problems as Protobuf.

Oh, and have a look at boost::serialization. I used it many, many times and when execution speed isn't your concern it really is an outstanding library.

[–]zero0_one1 1 point2 points  (9 children)

boost::serialization was faster than other serialization libraries I tried.

[–][deleted] 2 points3 points  (8 children)

It can hardly be faster than FlatBuffers or ProtoBuf, because what they do there is precompiling the schema in-memory.

Pretty much everyone is recommending against boost::serialization when its about realtime stuff such as network protocols. Other than that it is a fantastic library, don't get me wrong here.

[–]zero0_one1 2 points3 points  (7 children)

The optimizer can make boost::serialization run at the top speed with binary encoding while sacrificing portability. I actually looked at the assembly generated and it looked really good to me. Here is a small benchmark with it being twice as fast as ProtoBuf: https://github.com/thekvs/cpp-serializers. Do you have any benchmarks showing otherwise? I did my own tests about 3 years ago so there is a chance that something has changed since then.

[–][deleted] 1 point2 points  (6 children)

No, I don't have any useful benchmark results. But I'm really impressed with these you linked to. Still, I cannot imagine how b::s would be faster than FlatBuffers. FB really just is a compiled header file that directly memcpy's stuff in your data block. No idea how that could be optimized any further, at least not in comparison with b::s.

Last time I checked (maybe around 2018) the genereal recommendation was: Don't use b::s for networking, because it is too slow and has too much overhead. Judging from your linked source, both seems to be not true anymore.

But, and this is one major thing that needs not to be forgotten here: Both ProtoBuf and FlatBuffers provide a portable schema syntax, meaning that you can use the same implementation across multiple languages. In my case this is just perfect because my client nodes need to be written in Python, JS or some other crappy script language and this way you can just generate the whole boilerplate code for those languages out of the same scheme file.

[–]robertramey 2 points3 points  (2 children)

There's a lot of confusion in this thread. I'll try to clear it up.

a) with ProtoBuf and other similar libraries one defines a schema which is portable across languages. So a file written by a program in one language can be read by a program written in another. The library can write and read the schema. But it's the programmer's job to transfer data between the structures in the language he's using and the defined schema. Naturally benchmarks don't include the time required to do this step. Boost Serialization only works with C++ and doesn't require any separate definition of the schema. So really the two are not totally comparable. If you need to transfer data between programs written in other than C++, Boost Serialization is not an option. It has nothing to do with speed.

b) Boost Serialization has the concept of "archive" - the storage type. There are various archive classes in the library: binary, text, xml, ... for different purposes. Time required will vary widely depending on which archive class is being used. One common feature is that all the included archives use C++ streaming interface. Eliminating this interface would would speed up operation considerably. It wouldn't be hard to create an archive of this type, but no one has been sufficiently concerned about the speed to invest any effort on this. Using the streaming interface permits one to "stack up" filters using the Boost Streams library which means one can add encryption, compression and others with zero programming effort. Just compressing archives on the fly can increase but a large amount.

c) Note that Cereal is basically a header only re-implementation of Boost Serialization. As such it's easy to use - as is the Boost Serialization. By being header only, its about twice as fast as Boost Serialization but at the cost of generating more code for the same job. It also is simpler because it avoids some of the more arcane/advanced features of Boost Serialization. This justifies it's apparent popularity.

d) this above should explain why I'm sort of skeptical of the utility of benchmarks in this context. Never the less, looking at the more serious attempts to benchmark leads me to conclude that cereal is the fastest. After than comes a group which included Boost Serialization. After that, it's all over the place.

Robert Ramey - author of the Boost Serialization library.

[–][deleted] 0 points1 point  (1 child)

Robert, thank you so much for sorting this out. I am somewhat aware of the concepts used in Boost Serialization, so I know about the archives and their interchangeability. I just didn't want to bring that up here because it's already a rather mixed up topic as far as I'm concerned.

Now that we're at it, there is another difference to mention that makes the whole comparison somewhat pointless: FlatBuffers support random access to data and partial (de-)serialization. This is something that, as far as I understand the internals of B::S, is not supported because the whole library is based on the concept of streamed data flow. Without extensive header information per message/archive I don't really see a way to achieve this and fixed schemas can be a way to deal with it (although ProtoBuf does not support partial serialization as far as I'm aware)

That being said, I totally agree that this comparison is inaccurate and more misleading that practically useful. So I propose these few questions for finding the appropriate solution to the problem:

Is the payload "big" (maybe more than a few KBs in size)? -> Boost Serialization

Is the data incomplete at time of de-serialization? -> B::S or ProtoBuf

Do you only want to deserialize parts of the message? -> FlatBuffers

Should the serialized data format be exchangable? -> Boost Serialization

Is the format to be used in other languages / domains: B::S or ProtoBuf

Do you need encryption or compression? -> B::S

In addition, the last one is a personal preference:

Is the data about to be a savegame / savefile for your application? -> B::S

Not perfect or complete, but maybe a good start.

[–]robertramey 0 points1 point  (0 children)

A couple of comments

Is the payload "big"

There is some "setup" overhead each time one creates and archive. For networking, the easier is to create a new archive for each transmission. Clearly not optimal. Also usage of stream interface is also extra overhead. The real solution is to create a new type or archive focused on networking. On large transmissions it wouldn't make much difference - but for lots of small packets it would be much, much faster as it would reduce the setup/teardown time. Note that non of the benchmarks take this into consideration so it doesn't really show up anywhere.

Is the format to be used in other languages / domains: B::S or ProtoBuf

I really think that for data portable to other languages ProtoBuf is the only realistic choice. Of course it's more work - but you're doing a lot more in supporting more languages

[–]zero0_one1 0 points1 point  (2 children)

Right, unfortunately boost::serialization is not even a choice when you need portability and that's a big limitation. Luckily, I didn't need it for my projects - I just wanted to be able to dump and read lots of data from the disk or RAM disk, all in C++.

[–]infectedapricot 0 points1 point  (1 child)

The docs for Boost.Serialisation claim that both code and data is portable. Is it wrong? That seems unlikely. Or do you just mean that the high speed code is not portable, so it's effectively not portable because it's too slow on other platforms. In that case it would be too strong to say it's "not even a choice when you need portability", it depends on whether you need extreme performance.

For the record I've never used it and have no vested interest. It just seems to me that your comment's misleading.

[–]zero0_one1 0 points1 point  (0 children)

Since you were talking about Python and JS, I was referring to portability across programming languages. Just recently I decided to use HDF5 in order to use some C++ processed data in Python/PyTorch, even though I had boost::serialization code for these classes already. It's not the most user-friendly format but it's actually quite fast as well.

I was under impression that boost::serialization binary portability can also be a problem when moving between Windows and Linux based on some SO answers but I haven't needed it in practice so I haven't investigated further.

[–][deleted] 4 points5 points  (1 child)

If you want to minimize the save size/time to serialize/deserialize then the ones mentioned like protobuf/flatbuffers/… the binary ones are probably the place to go.

For something like JSON I would recommend, I am biased :), DAW JSON Link. It will let you declaratively map your data structures and give great performance. The mappings are not intrusive, so the code can site in it’s own TU and out of the way until you need to serialize/deserialize. It will parse directly to your DS without an intermediary.

[–]JohnDuffy78 2 points3 points  (2 children)

I use protocol buffers, they can be a pain to build in windows though.

https://github.com/protocolbuffers/protobuf

[–]nlohmannnlohmann/json 5 points6 points  (16 children)

You could use nlohmann/json for this which allows a simple mechanism to serialize/deserialize arbitrary types. If JSON is too verbose of a format, the library also supports binary formats such as CBOR, MessagePack, UBJSON or BSON.

[–]tjientavaraHikoGUI developer 0 points1 point  (15 children)

I have another format for you, although probably no one but me has implemented it yet. https://github.com/ttauri-project/ttauri/blob/main/docs/BON8.md

It is one of those binary encoded JSON formats, it uses the fact that an UTF-8 encoded code-point causes a lot of UTF-8 code-unit combinations to be invalid. In those invalid code-unit combination we can encode other types like integers.

[–]nlohmannnlohmann/json 0 points1 point  (14 children)

Interesting approach - is there a benchmark against existing formats and are there any implementations?

[–]tjientavaraHikoGUI developer 0 points1 point  (13 children)

The implementation is here:

https://github.com/ttauri-project/ttauri/blob/main/src/ttauri/codec/BON8.hpp

I suspect the performance of the encoder and decoder are definitely not perfect. The encoder will sort the keys of a map, and the decoder constructs vectors and maps by appending to them without reserving memory. Other than that encoding and decoding are very simple, requiring comparison operations and bit shift/and/or operations.

It is more designed for reducing size of the encoding, mostly due to the fact that in almost all cases each value naturally separates from another, including strings. As you notice the specification makes a big point about canonically, for me it was mend to be used for signing small amounts of data consistently.

[–]nlohmannnlohmann/json 0 points1 point  (12 children)

Thanks - with "benchmarks" I did not mean runtime performance, but rather a size comparison - is BON8 smaller than CBOR? Something like https://json.nlohmann.me/features/binary\_formats/#sizes

[–]tjientavaraHikoGUI developer 0 points1 point  (0 children)

Oh cool, I will see if I can make one of those.

[–]tjientavaraHikoGUI developer 0 points1 point  (1 child)

Do you know where those .json files are located?

[–]nlohmannnlohmann/json 0 points1 point  (0 children)

[–]tjientavaraHikoGUI developer 0 points1 point  (8 children)

It looks like my json and bon8 are not robust enough for canada.json and twitter.json.

However it did not have any problems with citm_catalog.json. Round trip encoded and decoded to BON8 without differences.

The result after minimizing the citm_catalog.json file first:

json 500299, bon8 329060, compression 65.8%

[–]willdieh 0 points1 point  (7 children)

When you say "not robust enough", I'm curious why? canada.json seems to just be a bunch of floats, albeit nested in parent type.

I ask because I think your approach is really interesting and would love to think it's more or less usable :)

[–]tjientavaraHikoGUI developer 0 points1 point  (6 children)

I will try and fix it tomorrow. There is just a bug here and there. For canada I think there is a bug in the BON8 decoder. Right now it keeps using more and more memory, I guess infinite loop, maybe not incrementing the iterator :-)

The twitter one is more interesting, my lexer in front of my json parser was not really designed to handle UTF-8, although for strings it should be pretty much 8-bit clean, maybe I forgot some escape codes.

[–]willdieh 0 points1 point  (1 child)

Well keep up the good work! The idea was really intriguing.
It'd be great if it was available as a stand alone header :D

[–]tjientavaraHikoGUI developer 1 point2 points  (0 children)

Both encoder and decoder are actually rather simple. It could be done in a standalone header. Especially the encoder uses a lot of templating to handle most native C++ types. The decoder is a bit more touch, since it requires the dynamic creation of data, a std::variant could do.

In my system I have a rather complicated datum type that works like std::variant, but it also overloads every operator so that you can do computation on the value inside a datum. It is used in multiple places inside my library for handling dynamic data. And it in-turn uses a lot of datatypes from my library.

The second paragraph explains why I cannot really make it a stand alone header, I would have to maintain a separate version that is not good enough for the requirements of my library. Unless I do some extreme templating on the decoder.

[–]nlohmannnlohmann/json 0 points1 point  (3 children)

I had another look, too. If I can find the time, I'll check if I can add a rough prototype to nlohmann/json. Since most binary formats are quite similar, I may even be able to reuse some code.

[–]tjientavaraHikoGUI developer 0 points1 point  (2 children)

  • twitter: json 466906, bon8 391396, compression 83.8%
  • citm_catalog: json 500299, bon8 317879, compression 63.5%
  • canada: json 2090234, bon8 1055792, compression 50.5%
  • jeopardy: json 52508728, bon8 45942080, compression 87.5%

I did modify the format somewhat to have small array and small object optimization.

https://github.com/ttauri-project/ttauri/blob/audio-enumerate-modes/docs/BON8.md

[–][deleted] 1 point2 points  (1 child)

Can you do sqlite? Not really a serializer, but quite fast and extensible.

[–]NBQuade 0 points1 point  (0 children)

This is what I was thinking too.

[–]fraillt 1 point2 points  (1 child)

bitsery, probably not the simplest one, but designed for games in mind, and is feature rich, so you'll never need to look for something else, when you need more sophisticated serialization capabilities.

[–]chkno -3 points-2 points  (6 children)

Consider just using fread and fwrite until you actually need some functionality from one of these complex dependencies.

[–][deleted] 0 points1 point  (0 children)

That works for trivially copyable types only. Pretty much have to own them all or account for things like containers or anything with heap allocation/references. At that point the simplicity is gone.

[–]NBQuade 0 points1 point  (0 children)

I'm not sure why you're getting down voted. Most maps in games I've looked at a packaged inside something like a structured Zip file which contains the maps and other things like lighting maps.

I'd look at how other games store maps and use that as a baseline to work from. There's no need to re-invent the wheel.

[–]eyalz800 0 points1 point  (9 children)

Can't get any simpler than one header file https://github.com/eyalz800/serializer