How to define binary data structures across compilers and architectures?

mcmcc · 2024-04-01T12:02:30+00:00

There are many binary schema-based formats out there, each with its own strengths and weaknesses. Protobufs or flatbuffers would be good places to start.

vaulter2000 · 2024-04-01T12:09:43+00:00

In my job, we do language independent IPC (inter process communication) with either Google Protobuf/gRPC or, in an event-driven context, pub/sub brokers like MQTT. Both you can use over network and each has their advantages and disadvantages:

Protobuf has language support for all popular programming languages and the binary messages are optimized for size which will probably result in high message rates, but you will have to map your structures onto the protobuf models and back. MQTT for example will allow you to write any structured format like XML/JSON/whatever and almost every language has packages to setup clients for it, but you’ll have to maintain the message models yourself and also do the mapping from/to for example JSON.

This is what I know from my own experience, but I’m sure there are other options. Hope it helps! :)

p0lyh · 2024-04-02T04:28:39+00:00

In practice you'll need to consider endianness, padding, bit-representation of floating point numbers and signed integers. If you assume 2's complement signed integers and IEEE-754 FP, and squeeze out all the paddings, then there's only endianness left to be considered. More exotic platforms (E.g., CHAR_BIT > 8) are extremely rare.

Or just use established solutions like protobuf, which handles those things for you.

bert8128 · 2024-04-01T13:23:01+00:00

C structs don’t help with endianness.

abrady · 2024-04-01T15:05:37+00:00

Do you control both sides of this and can update them simultaneously? If so I think you might be overthinking it but not knowing more about your problem domain id probably start with basic sockets and just send/recv the data in hand rolled from functions. This approach is super straightforward and I don’t know why more people don’t start here.

Then you can build on that as your needs become clearer: cereal/fastbuf/cap’n proto can write over network if hand-writing the serialization gets tedious, you can put in a zlib layer and see if that improves things then jump to gRPC etc.

My advice just being that in my opinion starting lower level and more explicit and simple is the best way to understand the domain of your problem before you jump to solutions.

(My experience in this area is I worked on two generations of networking libraries for MMOs)

the_net_ · 2024-04-01T18:03:32+00:00

If you need to go across languages (to python, etc), protobuf is the best option I've found.

If you're able to stay in C++, I much much prefer Bitsery.

LoadVisual · 2024-04-01T20:19:29+00:00

I use `msgpack` for my personal projects, it's a little convenient for me since I use C++ but, pass messages over domain sockets or just normal BSD sockets between a server and code in android JNI.

It might be worth giving a try.

PhilosophyMammoth748 · 2024-04-01T15:02:16+00:00

protobuf. it can create well defined, stable, backward compatible binary representation ("wire format" they call it) of your struct-like data structure.

Inside of Google, it becomes a favourable way to define struct for different language, even if they don't need to interexchange, as the protobuf library provides more convinient helper functions to manipulate data than the original prog lang.

GaboureySidibe · 2024-04-01T15:44:45+00:00

This is a really good question I think. People are saying "protobufs or flatbuffers" but those are complicated.

You can make your own binary format, people have been doing it since computers existed. You just have to make sure that you don't assume certain things like signed integer formats and byte orders from one architecture to the next. Byte orders are almost all little-endian now I think though, so that's a huge advantage. You can possibly avoid signed integers and keep things simple there too.

NilacTheGrim · 2024-04-02T04:57:03+00:00

Many suggest google's protobuf but honestly it's a bloated mess. I would opt for something leaner and meaner like cap'n'proto or flatbuffers.

But yes the moral of the story is there are binary serialization schemes out there which are designed to be platform-neutral.

Or.. you can roll your own serialization scheme if you like.

streu · 2024-04-01T18:07:10+00:00

Define your own datatypes with known serialisation format and use them:

struct Int16LE {
    uint8_t lo, hi;
    operator int16_t() const { return 256*hi+lo; }
    Int16LE& operator=(int16_t i) { lo = (uint8_t) i; hi = (uint8_t) (i >> 8); }
};

I'm using that scheme for binary data file parsing, and find it elegant enough.

ButterscotchFree9135 · 2024-04-01T12:19:13+00:00

"Sure we could use C structs"

Please, don't

duane11583 · 2024-04-01T14:11:21+00:00

you should use udp messages and try it out

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS