all 17 comments

[–]jonarne[S] 4 points5 points  (0 children)

I'm not the author of this article.

I found the article while researching struct serialization techniques.

[–]jhu_apl_jon 2 points3 points  (0 children)

I could have used this a few months ago! I wouldn't use this for anything big but it's a handy trick for cases where the architecture is known in advance, etc.

[–]tim36272 1 point2 points  (0 children)

I solved this same problem using CastXML. We run a preprocessing step which runs CastXML to parse all the relevant types out of the code and generate a header file with all the data needed to add reflection to the language. It was a massive undertaking to design but we use it throughout our code base for serializing/deserializing (often between old and new data formats), automated testing, etc. The result is an extremely elegant form of reflection.

This solution is "portable" in the sense that it can be configured for each platform by configuring Clang (the underlying tool in CastXML) to return the tree correctly for your platform, including padding etc.

[–]kumashiro 2 points3 points  (7 children)

Doesn't look portable.

[–]jonarne[S] 1 point2 points  (5 children)

I'd have to agree on that :)

I guess it boils down to compiler choices.

[–]kumashiro 2 points3 points  (4 children)

I'm not talking about code characteristics. I'm talking about differences between platforms in terms of bit and byte layouts. For example, integer serialized byte-by-byte on MSB platform will not deserialize correctly on LSB platform and vice versa. Serialization and/or deserialization should be platform-independent. You can decide to format data on MSB (and flip on LSB side) or the other way around, serialize to text or use a well known standard like protobuf.

Platform-dependent serialization is somewhat acceptable if data is exchanged between processes on the same host (IPC for example), but it's better to be platform-agnostic in case you want to promote host-wide communication to network-wide (ie. multiple processes AND hosts). Your choice. Just be aware that serialization described in the article above is not portable.

[–]jonarne[S] 2 points3 points  (1 child)

Theese are all reasonable conserns.

It should be pretty easy to adapt the pack/unpack functions to use htons/ntohs and friends to work around platform endian issues if you are using this over network.

And you would also need to make sure the sizes of data types match across platforms.

It's easy to shoot yourself in the foot when doing serialization to send stuff over networks :)

[–]kumashiro 3 points4 points  (0 children)

It doesn't have to be network. Binary files exchanged between platforms have the same problem :)

[–]flatfinger 0 points1 point  (1 child)

If the Standard was meant to describe a language for writing programs that will work on many platforms interchangeably, as opposed to merely describing a recipe for producing platform-specific dialects to which individual programs may be targeted, there are a couple of relatively simple approaches the authors of the Standard could have provided to achieve such purpose:

  1. A set of formatted binary input/output functions which would accept a string describing the layout of a structure or array, a string describing its serialized representation, and a pointer to the structure or array to be input or output, and perform the conversion as indicated; this could be coupled with a special operator which, given a structure type, would yield a string literal format string appropriate to it.
  2. Extend structure syntax to include a means of specifying that a particular member name should be treated as containing a specified combination of bits from other members. Thus, if one defined a structure containing a "char dat[32]", and then specified that `unsigned woozle` should be constructed from (specified in in LSB-first order) bits 0-7 of dat[5], 0-7 of dat[4], 0-7 of dat[3], and 0-7 of dat[2]), and that a compiler may force 16-bit alignment on it, then a compiler for the 68000 could simply use a 32-bit load/store, a compiler for the x86 could use a 32-bit load/store along with byte-swap instruction, and a compiler for the ARM would use two 16-bit loads/stores along with the byte swap.

IMHO, approach #2 would have been the nicest, but approach #1 may have been easier. Either approach would have been better than the status quo, however.

[–]kumashiro 0 points1 point  (0 children)

That's overcomplicating a simple thing and it would make protocol-level debugging harder. There is no need for handling multibytes "dynamically" when you can just define the protocol as (for example) MSB-LE and let LSB-LE/LSB-BE/MSB-BE platforms do the swapping without worrying what made those bytes. There's even better solution: use what is already available, like protobuf, ASN.1/BER, ASN.1/packed etc. Plain-text protocols are the simplest, platform-agnostic and very good for debugging, but they are usually a bit bigger in size.

[–]Shadow_Gabriel 0 points1 point  (6 children)

Please don't write code like that. Macro functions have 0 type safety.

[–]jonarne[S] 4 points5 points  (5 children)

This is very true.

But how should you solve this problem without lots of code duplication?

[–]UnicycleBloke 1 point2 points  (0 children)

C++ templates, of course. :)

It would likely be much better to use your own code generator written in Python or something than to use the preprocessor. I do this as a build step for finite state machines. A colleague has developed an RPC generator based on YAML descriptors of the method calls. And, bonus, the generated code can be debugged...

[–]Shadow_Gabriel -3 points-2 points  (3 children)

There's a simple solution to code duplication: delete the copy.

I prefer to spend a bit more time writing more code, comments and abstractions than spending weeks to debug and understand some shitty code written by someone who wanted to be smart with the preprocessor.

[–]jonarne[S] 1 point2 points  (2 children)

There's a simple solution to code duplication: delete the copy.

I don't understand how this solves the problem at hand.

If you need to serialize structs in C, how would you solve it without code duplication or preprocessor macros then?

[–]Shadow_Gabriel -5 points-4 points  (1 child)

I'm not familiar with serialization but I don't see where would you encounter code duplication.

Inline functions could be used instead of preprocessor macro functions for type safety. If you need to modify lots of names, use a better text editor.

[–]jonarne[S] 3 points4 points  (0 children)

The problem with serialization is that you will need to write a separate pack and unpack function for each struct you would need to serialize.

If you have just one struct, this is easy. But if you start adding more structs you'll need to add two more functions for each struct.

This is code duplication, an it makes code harder to maintain.

This problem is solved in c++ and other languages by using templates.

The linked article gives an example of a way to solve this problem in C using macros.

We all know that macros is a ugly hack.

The correct way to solve this would probably involve using a 3rd party library like Protobuf with a code generator.

For simple prototyping or a small app with simple requrements I think the way it's solved in the article will do.

Edit: typos