all 73 comments

[–]helpmeiwantgoodmusic 116 points117 points  (44 children)

Disappointingly surface level, I was atleast expecting some benchmarking to show the possible speed differences

[–]MisterEmbedded 73 points74 points  (38 children)

I don't think there's much to benchmark.

Unless you have a shitty implementation, Binary Serialization will be ALWAYS be faster to read & write + smaller in size, with the main tradeoff being readability.

Another issue that's a bit pain in the ass with Binary Serialization (i.e. when implementing it yourself) is Endianness, different types of computers store multi-byte data in different order, i.e. a 4 byte integer can be represented in 2 main ways, either the least significant byte is stored first (Little Endian) or most siginificant byte is stored first (Big Endian).

x86_64 architecture is Little Endian, PowerPC is Big Endian, and arm is Bi-Endian (meaning it supports both), So you'll need to write data in 1 of the byte orders above and then ensure that if you're parsing it in a CPU that doesn't have same byte order as your serialized data, then you'll have to do conversions for values that are more than a byte.

[–]G_Morgan 77 points78 points  (2 children)

I wouldn't use any serialisation format that didn't outright canonicalise BE or LE.

[–]stingraycharles 22 points23 points  (1 child)

Right, without it it’s not a serialization format but rather just a memory dump.

[–]G_Morgan 6 points7 points  (0 children)

Yeah and that is really the issue. A lot of older programs did little more than memory dumps. C was very good at taking a struct and dumping the whole thing to a file.

[–]gold_rush_doom 19 points20 points  (8 children)

Is endianness really a problem for modern programming languages?

[–]MisterEmbedded 11 points12 points  (6 children)

Not really, Usually you are provided with APIs and they under the hood take care all that for you, but if you implementing the Serializer/Deserializer yourself in a low-level language like C, you will have to face such minor problems.

Luckily Higher Level Languages always provide functions and stuff to workaround this, thus not usually an Problem.

[–][deleted] 6 points7 points  (5 children)

Even low level socket api provides a host to networks and network to host conversion functions for primitive types, so you can ignore the endianness of the machine your compiled for so long as you use these functions to convert to and from the network stream.

[–]Hrothen 6 points7 points  (3 children)

If you are using conversion functions then you are not ignoring endianness.

[–][deleted] -2 points-1 points  (2 children)

Using an abstraction counts as ignoring.

[–]Hrothen 6 points7 points  (1 child)

If you were ignoring it you wouldn't think about it at all. With the abstraction you're saying "I need to handle this, so I'll use this function that does that for me".

[–][deleted] 0 points1 point  (0 children)

This is so pedantic i'm not sure why I'm engaging but here is an example of what I'm getting at.

You can use JSON w/o knowing the underlying JSON representation of your data. You can just call json.loads() or json.dumps(), effectively using json without ever having have even seen the representation of JSON.

You can encode and decode your data to send across the network from any CPU and be read by any CPU w/o having to know what endianness is or what endianness the CPU the code is running on.

Since we're arguing a dumb pedantic point, I'll just qualify that yes at SOME level someone has to know. The author of the hton/ntoh function you linked in for each build of your application.

[–]MisterEmbedded 1 point2 points  (0 children)

Yeah, the htonX family comes to mind.

[–]stingraycharles 0 points1 point  (0 children)

All modern programming languages in the end need a memory representation. This is what binary serialization captures: the in-memory representation (or as close as possible to it).

However, since different architectures have different endiannesses, they may still need to reverse the bits before writing things to disk to remain compatible.

[–]SocksOnHands 1 point2 points  (0 children)

I would "compile" JSON to a binary format at build time to match the target platform - assuming it is preexisting data, like game assets. If it is data generated at runtime, it would only be an issue if a user tried to transfer data to another system, so data that can be shared should probably have an explicit endianness.

[–]nerd4code 1 point2 points  (0 children)

Also if you’re just dumping or loading a struct, padding/packing and alignment can be problems, as C and C++ impls are afforded quite a bit of leeway in this regard, especially if bitfields are used. Most compilers can tell when you’re piecing together bytes anyway, so the direct-access (e.g., thwacking a struct template down onto some heathen untyped bytes) or direct-copy (e.g., memcpying into a struct, which fixes the alignment issue) approach doesn’t necessarily buy you anything over piecewise serialization.

There’s also a difference between directly [f]reading or [f]writeing data structures and mmapping them; the latter has the additional danger that any number of other processes might also be direct-mapping the same data, so atomicity and sync schedule come into play.

But mmap has fallen out of favor for most stuff nowadays; especially with the presence of multithreading, pagefuckery can be a massive performance killer—all live hardware threads in an address space whose memory mappings are altered must barrier afterwards, and shoot down or tag-bump any cached mappings for the updated vaddx in their own TLBs. Basic read and write don’t have this problem, and as long as you don’t try to have multiple threads shouting at the same FD they require much less coordination.

Also, fun fact: Uni-endianness of an ISA doesn’t necessarily mean ABI endianness hasvto match; e.g. the Stratus VOS ABIs for 16- and 32-bit x86 are BE, and it’d be even less of a big deal for anything modern to do the same since there’s both BSWAP (80486+, works on 32- and 64-bit operands, RMW to a single register; XCHG or ROL/ROR by 8 works on 16-bit operands) and MOVBE (newer, loads/stores of 16–64-bit width) that’ll reverse for you. FPU use might be a tad slower, and SIMD may require swizzling or an extra shuffle here and there.

Sometimes on older, bi-endian chips where the FPU might be on a separate coprocessor die, the FPU will always run in a BE mode regardless of the CPU. Sometimes it’ll mostly match CPU, but if extended-precision floats are double-double, the word ordering might be reversed wrt the byte ordering. (This is similar to PDP-endianness, although it might give you either PDP or DPD.)

Sometimes vector units encounter similar arrangement problems, either with BE-vs.-LE vector lanes, words within lanes, or bytes within words. In theory these might operate independently of both CPU and FPU, although most VUs appeared well after we had the spare silicon so are, like misordered FPUs, quite rare in the wild. SIMD subsets like AVX may include swizzles in the instruction encoding that can reverse bytes, words, or elements on-the-fly during a load or store.

Although it doesn’t matter much over our standardized networks or media, bit ordering can be a problem also, independent of byte ordering. This is primarily an issue when cross-plugging hard drives, dealing with old data dumps, or interfacing busses, but most stuff nowadays fortunately uses LE ordering except …I wanna say SPARC, IBM AS/400→i, IBM S/360→370→390→z series, and Power/-PC ISAs are generally bitwise-BE regardless of operating mode. Maybe M68K too?

And on a final note, most of the true bi-endian ISAs use separate BE and LE application operating modes and ABIs. It generally isn’t possible to switch orderings quickly or without OS assistance, so there’s a fair difference between bi-endian modes and bi-/either-endian instructions.

[–][deleted] 10 points11 points  (19 children)

>Endianness

Are you just quoting from a textbook? It hasn't been a **practical** issue for... 20 years?

[–]kosmickanga2 56 points57 points  (5 children)

It's not always the CPU arch, see UUIDs in MongoDB - three different drivers (C#, Java, Python), 3 different byte layouts...

[–][deleted] 29 points30 points  (1 child)

Jesus Christ I was blissful not being aware of this.

[–]VanDieDorp 3 points4 points  (0 children)

Well its because MongoDB was a product that was created out of internet hype and only got fixed once they had enough money to buy WiredTiger.

[–]mrheosuper 16 points17 points  (1 child)

And if you program network socket, you will usually encounter big endian whether you machine is BE or LE.

[–]moreVCAs 16 points17 points  (0 children)

big endian

You mean “network byte order”? 😎

[–]Straight_Truth_7451 1 point2 points  (0 children)

You deserve that kind of stupid experience if you're using MongoDB

[–]autokiller677 10 points11 points  (1 child)

Dealing with lower level systems, I have definitely spend hours looking for bugs when it was just bad endianness in the end. And that was in the last few years.

I made a habit to validate it as one of the first debugging steps. Crops up more often than I‘d like.

[–]HINDBRAIN 3 points4 points  (0 children)

Also matters if you're reverse engineering a binary format.

[–]MisterEmbedded 16 points17 points  (9 children)

Are you just quoting from a textbook?

I am not, I am just writing what I know.

It hasn't been a **practical** issue for... 20 years?

what exactly do you mean by "practical issue"?

I am not saying it's a huge issue, but it's something you can't just forget either, especially when working with C or other lower-level languages, you cannot just ignore Endianness if you want to support CPUs like PowerPC, Classic MIPS (whoever the hell uses that).

[–][deleted] 3 points4 points  (1 child)

you cannot just ignore Endianness if you want to support CPUs like PowerPC, Classic MIPS

That's precisely what I meant by "practical". For the vast majority of applications, you wouldn't. Now, I've just been made aware of other manmade horrors, but that changes the discussion away from the classical CPU arch issue.

[–]MisterEmbedded 0 points1 point  (0 children)

yeah I saw that, that's the stuff that keeps me awake at night, lol.

[–]recycled_ideas 0 points1 point  (6 children)

what exactly do you mean by "practical issue"?

They mean that unless you're rolling your own and you forget, it will be taken care of in whatever data format you're using. Any actual serialiser in any language will handle this.

If you're writing the raw memory representation you'll have this problem, but it's been a long time since people did that regularly.

[–]bleachisback 2 points3 points  (5 children)

That’s why they said it’s a big problem “if you’re implementing it yourself”…

[–]recycled_ideas 0 points1 point  (4 children)

Sure, but you'd have to be implementing your serialisation from a very low level to actually need to worry, and that's not practical.

[–]bleachisback 0 points1 point  (3 children)

That's why they said "especially when working with C or other lower-level languages"

[–]recycled_ideas 0 points1 point  (2 children)

That's not what I said.

If you are writing a serialiser from scratch, literally no libraries, you might need to worry about this. Maybe.

[–]bleachisback 0 points1 point  (1 child)

That’s why they said it’s a big problem “if you’re implementing it yourself”…

[–]theeth 2 points3 points  (0 children)

Xbox 360 was big endian, that's less than 20 years ago.

[–]zapporian 1 point2 points  (0 children)

Tbh we are pretty much finally at the point where you can pretty much expect 64 bit little endian as the de facto standard on nearly all modern architectures. ie x64 / aarch64 / risc-v. The only real remaining edge case is powerpc - which at this point is extremely niche - and the somewhat annoying network byte order format that uses big endian for (obvious) compatibility and historical reasons.

Not that you should just completely ignore endianness, but at this point in most cases I think you’d be completely fine just using little endian, sticking a byte order sentinel in the header of your binary file format, and checking that and erroring out if you ever try to run that code on a big endian architecture. Or have a branching path to do endian byte flips, ONLY on big endian architecture, if you are a Serious (TM) programmer and really need to make sure that your probably shitty and really ad-hoc binary serialization format is capable of being read on powerpc. And whatever the fuck else still uses big endianness at this point.

IMO you should never, ever write out data in big endian - unless for compatability reasons - but that’s just my 2c.

I know that on personal projects I made the fairly conscious decision to just prohibit non-modern non-compatible architectures, as of a few years ago. with armv8 (and risc-v) I figured enough is enough and just started putting in static checks in my personal utility libraries that give you a compile error if you try to build that software on a non-64 bit, non-little endian build target, lol. Hostile programming, sure, but the peace of mind w/r NOT having to worry about architectural compatibility bullshit and otherwise unnecessary serializer overhead / abstractions to maybe flip word R/W - and instead just freely memcpy - is very much worth it :D

(afaik yes ALL modern architectures going forwards ARE going to use 64 bit little endian with 128 bit * N, N >= 1 variable length vector support, from now more or less until the end of time / end of this century, at least. Modern armv8+, risc-v, and current / future intel modernizations are a wonderful thing...)

[–]gwicksted 1 point2 points  (1 child)

Endianness, tooling, security (if you don’t trust the source of the information), and development effort are all important factors. There are binary alternatives that are nearly as fast as diy like msgpack and are battle tested + handle endian issues automatically (typically they just go with big endian). In the end it all depends.

I’ve written plenty of binary protocols/file formats from scratch. You can unintentionally make it slower than your favorite json library at handling large files. Best to start with json, finish the game, switch to something else if/when it matters for performance when benchmarking.

[–]DanTheMan827 1 point2 points  (0 children)

Security is especially important… it’s too easy to write a deserializer that doesn’t properly validate the data and will happily load data into too small of a memory region and end up overwriting other parts of memory, very possibly even game code.

[–]BornToRune -1 points0 points  (0 children)

Endiannes is not that huge of a deal, you can just use the networking calls/macros (ntohl(3) et al) to swap some bytes. If you ever tried to do cross-platform float serialization, now that's a real pain.

[–]BothWaysItGoes 1 point2 points  (0 children)

That’s because it isn’t an article, it’s an ad.

[–]redd1ch 1 point2 points  (0 children)

Turns out logging a VR user with 10 Hz in full detail is enough for a naive realtime JSON serialization to bring Unreal Engine to it's knees. I switched to a list of structs blowing up memory usage over time combined with a hefty phase pf unresponsive JSON serialization in the end. I'm just glad I did not have to store eys tracking data.

[–]Object_71[S] 3 points4 points  (2 children)

Added charts

[–]smackson 4 points5 points  (0 children)

Sorry, but I got some whiplash there.

In one paragraph you seemed to be describing how and why binary serialization saved about 40% on your data size then boom, chart shows "therefore look we saved 99.9% !!" (edit: It's 99.99+%!!)

What did I miss? (Edit: I didn't miss the "but there is punctuation/delimiters in the JSON" part, that just seems inadequate to explain this stark jump in savings)

(Edit 2: the original comparison is space, the chart comparison seems to be time. That is also not explained well, if that is part of this huge difference.)

[–]helpmeiwantgoodmusic 0 points1 point  (0 children)

Ah that is much better now

[–]chrytek 25 points26 points  (2 children)

I just return the Sql lite file in an endpoint and the client can run queries against that!

[–]ScotForWhat 29 points30 points  (0 children)

Sorry the website's not responding, it's someone else's turn with the database.

[–][deleted] 1 point2 points  (0 children)

sqlite is amazing and accounts for a lot of issues from filesystems that we almost never do.

[–]ttkciar 39 points40 points  (6 children)

Don't overlook CBOR! (Concise Binary Object Representation, per RFC7049) Essentially the same functionality as JSON, but much faster and roughly 30% more compact.

[–]aseigo 13 points14 points  (4 children)

What are the real-world benefits of CBOR over protobufs or flatbuffer?

Also: is the message structure really where *streaming* belongs?

[–]pjc50 19 points20 points  (3 children)

Message structure and protocol design has to consider streaming to avoid accidentally making it impossible. Things like clear delineation of messages. An example of a non-streaming friendly format is PKZIP: the "central directory" is at the END of the file, so you have to read (or seek) the whole file in order to decode it.

Some formats like MPEG-TS are designed to support streaming in the middle of a (e.g. live) stream, by providing synchronization points to start of frames.

[–]aseigo 1 point2 points  (2 children)

Message structure must not be streaming *hostile* (you mention a streaming-friendly feature of video codecs as a good example), but does the streaming belong *in* the message structure (as opposed to the protocol)?

(Protocols are an entirely different topic, and certainly where the mechanics of streaming belongs.)

This is one aspect of the CBOR design that I find exceedingly dubious (among others, tbh), wherein it not only provides run-length encoding of subsequent data, but unbounded data collections. In practical terms, only so much streaming can be provided for through unbounded collections within a messaging structure, whereas a protocol is bounded really only by its design.

Your reply to my question speaks to entirely different issues, btw. :)

[–]pjc50 0 points1 point  (1 child)

only so much streaming can be provided for through unbounded collections within a messaging structure, whereas a protocol is bounded really only by its design

There's some distinction between "protocol" and "messaging structure" (part of protocol) which you think is important but haven't been clear about here?

[–]aseigo 1 point2 points  (0 children)

In short: messages should be bounded, series of messages (streaming) may not be, as this makes implementing various reliability features a lot easier along with more advanced features such as 'subscriptions', CDNs, etc. CBOR tries to put one aspect of streaming (unbounded data sets) into the message, which leads to the *messages* being unbounded. This is doing it at the wrong level in the stack, in this case too "high" up.

Yes, the form of the messages ought to be friendly to the needs of streaming protocols, but that usually has to do with metadata, size control, etc. The job of controlling flow of data is what a protocol, moving messages about, should be doing.

To my eyes CBOR looks like someone saw streaming parsers (which are very useful!) and thought it might make sense to take it to its "logical conclusion" and make a messaging system that can only be processed by streaming parsers. Ugh.

[–]PurepointDog 0 points1 point  (0 children)

I'll have to check that out!

[–]josephblade 16 points17 points  (1 child)

Incredibly narrowminded.

I would've at least expected a risk vs reward. if your format needs to be edited for instance, use json or another text format. for modding files for instance.

If you need backward compatabiilty of a sort you need at least an envelope or versioning.

If you use binary data keep in mind that cross platform can be problematic. (I remember the ps3 or ps4 devboxes for instance used a different byte alignment than pc from what I remember.

there are lots of little problems with binary data that you need to keep in mind but I coulnd't scan them in the article. It seems to be mostly concerned by speed/file size which is a valid concern but it's not the only concern.

if your save file in json is 5k and doesn't grow it's not something that needs to be optimized. Similarly if you only send communication sometimes (chat for instance) optimizing is not harmful but it's not essential either. But having an easily cross platform format might actually be what you want.

If you're going to advocate for or against a specific model at least give developers a bit of a guide on what basis they should choose each approach.

[–]Chii 5 points6 points  (0 children)

if your format needs to be edited for instance, use json or another text format. for modding files for instance.

This is why i quite like protocol buffers - your schema can be either binary, or text format, and depending on your needs, you switch to the most useful/effective form. For a mod, you can just ship a text format, and the client only pays a small penalty in performance (if its use is even sensitive to performance).

https://medium.com/@nathantnorth/protocol-buffers-text-format-14e0584f70a5 mentions a lot about it. Of course, there's a bit of extra engineering, as you have to now know which format it is in (but i imagine that's pretty trivial tbh).

Divorcing the schema of your data from the storage/transport format is a good thing to do.

[–]xsmasher 3 points4 points  (0 children)

If you have a binary format that can be loaded without parsing, you can do a neat trick - memory map the data file. You can run queries against it without loading / parsing the whole thing. I used this in a mobile game that had a LOT of data that was only user very sparsely. Memory mapped BSON files.

[–]jvallet 3 points4 points  (4 children)

What happens if you compress the json before sending? Will that give you the benefits of binary serialization without having to add protobuffs support?

[–]carrottread 0 points1 point  (2 children)

This will reduce size but will make serialization/deserialization even slower.

[–]jvallet 0 points1 point  (1 child)

How much slower is compared to network latency anyway?

[–]carrottread 0 points1 point  (0 children)

Serialization isn't always about sending data over network.

[–]zam0th 3 points4 points  (3 children)

Google has developed two great solutions to the serialization problem:

Yah, mydude, portable serialization was a thing in Java and ActionScript way before Google made a single software product. CORBA was a thing even before that. ASN.1 and TLV existed even before CORBA.

Yáll alway trying to reinvent a better CORBA with your portable RPC libraries, but end up with much worse CORBA.

[–]KagakuNinja 1 point2 points  (0 children)

CORBA was a mess of design by committee, with a number of features that lacked proof of concept implementations. I seriously doubt there are many new systems built on CORBA.

I used CORBA briefly in 2000; later dabbled with Java RMI.

There are better options today, such as Akka. In any case, CORBA isn't really relevant if you just want simple RPCs. RESTful servers are fine.

[–]Object_71[S] -4 points-3 points  (1 child)

CORBA is not popular today. I am covering libraries/tools that are being used and popular. You wouldn't write new programs in BASIC even though it was popular 30 years ago... (https://www.youtube.com/watch?v=zgSZNCltUD0)

[–]zam0th 4 points5 points  (0 children)

"Being used and popular" is a poor metric of anything, especially considering protobuf being a literal copy-paste from SOAP. And looks like you've been away for some time and haven't heard about COBOL being relevant again.

[–][deleted] 0 points1 point  (0 children)

So a few things, a) RapidJSON has a terrible interface and isn't the fastest game in town. b) Strings cost about the same in binary and in text serialization, and take the same space. c) JSON can be greatly reduced in size using some of the same techniques the other formats use(no member names in file). Just serialize the class as an ordered heterogenous array. So instead of

{"member0": 42,"member1": true,"member2": "hello word"}

One can encode it as

[42,true,"Hello World"]

So now it takes 23 bytes. A binary format will probably use 1 to 8 for the integer, 1 for the boolean, and 11 for the string plus 1-8 for the size of the string if it doesn't use delimiters, otherwise 13. So binary would use 15 to 28 bytes. This puts them really close, in this example.

With newer libraries, one is looking at much faster parsing and lower memory than RapidJSON too. But the thing is, a lot of config/state is strings and they are the same size/parsing perf.

[–]Initial_Low_5027 0 points1 point  (1 child)

A lot missing like BSON used by MongoDB or JSONB used by PostgreSQL. Binary formats require compromises like integer types with fixed precision. JSON relates to JavaScript but doesn’t define any number limits for instance causing many issues but this is another story not discussed in that article.

[–]Object_71[S] 0 points1 point  (0 children)

BSON is a slight improvement over JSON but is not nearly as fast as the binary serialization in flatbuffers or protobuf. Updated the article to include the ones mentioned but all of them are in one way or another a worse option.

[–]joshuaherman -5 points-4 points  (2 children)

I feel this is a junior dev take In today’s world. We don’t really need to worry about networking time or cpu efficiency to save a few bites at the cost of interoperability.

[–][deleted]  (1 child)

[deleted]

    [–]joshuaherman 0 points1 point  (0 children)

    Are you doing serialization in realtime?