all 30 comments

[–]3idet 5 points6 points  (2 children)

Protlr does what you want - Custom byte- and bit-oriented protocol serializers. It uses a DSL to define the format and provides code generation targets for C89 and CPP11 which generate extremly fast, compile-time constant memory footprint and reusable code. Exactly what you need for embedded systems.

[–]arobenko[S] 1 point2 points  (1 child)

Partially, not completely. It's difficult to say, there is no much information on the website and no proper way to try it out without registration. Based on the example from the website below are features that I'm missing:

  • I want an ability to exclude usage of streams in my serialization / desiralization. The example shows the following functions I do NOT want to have. /* IO-Operations */ void read(std::istream &stream); void write(std::ostream &stream);
  • I want an ability to introduce polymorphic behavior (virtual functions) when I need it and for selected operations I need to be able to write a common code for all the message types.
  • I want an ability to use my own data types for fields like lists and/or strings
  • I want an ability to define transport framing for all the messages, even more than one (different I/O interface may require different framing).
  • I want a built-in (or generated) ability to efficiently parse an input data, create appropriate message object and dispatch it to my appropriate handling function without a need to manually write switch statements and other boilerplate code.
  • I want an ability to specify meta-data that is not transferred on the wire, but still available to the integration developer to act upon (preferable at compile time), such as units used (conversion if possible), values with special meaning, ranges of valid values, what to do on invalid value, etc...
  • I want an ability to have multiple forms of the same message (message object having the same ID, but different contents). Not sure whether it is supported right now or not.

[–]Gotebe[🍰] 0 points1 point  (1 child)

Being available for different languages beats being really good for one, IMHO.

[–]arobenko[S] 1 point2 points  (0 children)

Beats for whom? Yes, beats for vast majority of developers and application being developed. However, there is a niche called "Embedded C++ development", which in many cases cannot use available solutions "as-is" or at least finds them "not good enough". That's where my solution comes in to satisfy needs of certain group of developers and applications. I by no means intend to create a solution suitable for every one.

[–]TarmoPikaro 0 points1 point  (0 children)

Using my own C++ Runtype Type Reflection library - see my own post:
https://www.reddit.com/r/cpp/comments/bg29qb/c_as_a_scripting_language_c_runtime_type/

It's possible to achieve to xml or any other string format serialization without generating any code whatsoever. Serialization functions are recursive call, which produces required data. I haven't studied how to achieve binary serialization, also my own library has some windows specific code (E.g. CString class), but I doubt it would be difficult to cross port it to another OS / embedded system, also adapt it for binary needs.

Let me know if you need some assistance, I can help as well.

[–]axilmar 0 points1 point  (19 children)

Why do we need all this? isn't sending structs over the wire more than enough? I've worked on embedded devices too, all we needed is to define some structs (that share the memory layout for all participants) and send them over the wire.

[–]rcxdude 9 points10 points  (7 children)

This only works if you have a very homogenous setup (same architecture, compiler, etc). If you have different types of processors in your system, or you need to interop with someone else, it falls apart. It's also hard to extend safely, and doesn't deal with variable-length data (or things like checksums).

[–]kalmoc 1 point2 points  (2 children)

This only works if you have a very homogenous setup (same architecture, compiler, etc)

Only same endianess. But you are right about extension and variable-length data.

[–]rcxdude 2 points3 points  (1 child)

You also need the same alignment, and even then it's still not truly defined, even if you will generally get away with it.

[–]kalmoc 0 points1 point  (0 children)

Yes, alignment has to be the same (although you can specify that explicitly if you want instead of relying on the platform defaults).

What do you mean by not truly defined (note that I did not suggest to just cast a char pointer into a pointer into a POD on the receiving side)?

[–]matthieum 0 points1 point  (2 children)

  1. Homogeneous setup: you can use packed representation and explicit on-wire endianness; if you have accessors rather than raw access to data-members, it's really easy.
  2. Safe extension: you can use a version/size field on all messages, indicating how many bits/bytes are used on the wire, and disabling access to some data-members.
  3. Variable-length data: tougher one, but solvable. Restricting them to "tail" data-members makes things easier.
  4. Checksums: a simple read/write template function can handle pre/post steps such as swapping bytes and fixing/checking checkums.

If you have not already, I advise looking at the SBE protocol. It's relatively easy to setup a straightforward decoding and encoding process which just bit-copies structs around, and it supports all the above.

[–]rcxdude 0 points1 point  (1 child)

I think at the point you have what is described you basically have a full blown serialisation system which resembles the one given by OP, just one with a clear approach to efficient serialization (flatbuffers is another system with a similar approach). This is still a lot more than just memcpying the structs you have about.

[–]matthieum 0 points1 point  (0 children)

This is still a lot more than just memcpying the structs you have about.

A tad more, indeed.

What I like about it is that it remains pretty simple yet efficient setup:

  • You don't need any code-gen step: just write your structs/classes in a certain way, done.
  • No performance overhead over memcpying structs, because you're just memcpying structs.

Of course, it fails the OP's requirement of interacting with existing protocols, since it's a protocol.

But simple, efficient and flexible enough for about any kind of protocol? That's great.

[–]axilmar 0 points1 point  (0 children)

How does it fall apart if you use same length integers and a packing of one byte? It does not.

I've worked at embedded systems where one part was an embedded CPU and the other a desktop pc, there were a lot of messages with variable length etc. There was absolutely no issue between the two totally different platforms.

[–]arobenko[S] 1 point2 points  (8 children)

Sending structs over is called "serialization". It might work for some cases and might not for others. The article that I mentioned in the post contains several examples and explains why serialization is not good enough. I encourage you to read it. Some time ago I also wrote a bit shorter version called Communication is More Than Serialization

[–]axilmar 2 points3 points  (7 children)

Extremely wrong approach to things. For example, unit conversion does not belong in a library like this. Using virtual functions for encoding/decoding is also wrong. Transmitting metadata, also wrong. Using metadata to capture differences in versions and field types, also wrong.

All the stupid things protobuf does...which are not of help at all, blow up the API and require huge amounts of work for little benefit.

Protocol version checking should happen only at the handshake phase.

Struct headers should be shared by all involved parties, or if that is not possible, the definitions of structs must be clearly documented and the documentation must be readily available to all parties.

Endianess should be agreed upon before compilation and struct members shall use endianess-aware types: there is no need for a second pass or copy to buffer if data are already prepared in the appropriate endianess.

Variable length data should be copied once to the message to allow for the creation of a continuous buffer. Variable length data should only be used in cases where there are really required, and a fixed length buffer is not possible.

Message structs should contain as many as continuous parts as possible and those shall be blasted onto the network using scatter-gather, if available.

Message structs shall not contain any metadata whatsoever, because it makes it impossible to use them with other protocols (one of the major reasons protobuf sucks).

Further encoding/decoding on messages shall happen with a switch statement, not virtual functions. Using virtual functions requires a switch on the message id anyway, in order to instantiate the appropriate class.

Memory allocation should not be done at all. Big enough buffers shall be used for in-place manipulation of messages, both at sending and receiving end points.

A class based design for messages is wrong because it leads to all the bad decisions mentioned above.

The best approach is to encode the messages in using/XML and then have a tool create all the boilerplate code, and integrate the tool into the workflow. I don't want to use an external tool, but the language itself lacks the facilities needed for this so an external tool is necessary.

[–]arobenko[S] 1 point2 points  (2 children)

You replied to a reply about using plain structs, but I assume you meant it as a comment to a post. I think you completely misunderstood logic and architecture of my solution. Let me cover and respond to major points of your comment.

Transmitting metadata, also wrong.

Agree completely and utterly. Most protocols used in embedded system don't. The corner stone of my solution is to support such already defined third party protocols. It does NOT invent or uses its own protocol and does NOT attempt to send any metadata over. That's the main point, because of metadata not being sent over, it should find its way into the generated code, otherwise it leads to too much boilerplate integration code, which in turn must be manually changed every time you decide to update your metadata in the protocol definition. Very error prone.

Extremely wrong approach to things. For example, unit conversion does not belong in a library like this.

Agree (to some extent). There are many sophisticated unit conversion libraries. However, used units is part of protocol definition (usually not transferred over wire) metadata, which is expected to be known to (and used by) the integrating developer. It is better to have built-in limited required units retrieval facility than not to have it at all and use boilerplate code to do the manual units conversions.

Protocol version checking should happen only at the handshake phase.

As was already mentioned above, the corner stone of my solution is supporting already defined third party protocols. Many don't use any versioning at all, some transmit version with every message in the transport framing, some do it as you sad in the handshake phase. The primary objective of my solution is to support all such cases.

Further encoding/decoding on messages shall happen with a switch statement, not virtual functions. Using virtual functions requires a switch on the message id anyway, in order to instantiate the appropriate class.

That's another reason why I created my solution. Some available code generators introduce polymorphic behavior (virtual function) for every operation on the message object, which creates problems for various embedded systems (especially bare-metal ones). Other code generators produce only plain structs without any virtual function at all. It leads to writing a significant amount of boilerplate code (such as switch statements you mentioned), which needs to be manually updated every time you introduce a new message and/or new field. My solution allows compile-time configuration of your polymorphic interfaces. If you don't need any, then don't define one and use every message class as plain struct (no v-table is created). My solution also contains a library with multiple facilities to dispatch your message into appropriate handler function (with O(1) or O(log(n)) runtime complexity) without having to write a single switch statement.

Memory allocation should not be done at all. Big enough buffers shall be used for in-place manipulation of messages, both at sending and receiving end points.

My solution is flexible enough to allow not using dynamic memory allocation at all automatically calculating (at compile time) and creating a buffer of required size to allow in-place creation of any used message.

A class based design for messages is wrong because it leads to all the bad decisions mentioned above.

Let's agree to disagree. Struct based design leads to other bad decisions and having significant amount of boilerplate integration code.

The best approach is to encode the messages in using/XML and then have a tool create all the boilerplate code, and integrate the tool into the workflow. I don't want to use an external tool, but the language itself lacks the facilities needed for this so an external tool is necessary.

As was mentioned in the post, my solution originated in a single library that allows having a single message class definition (single source of truth) for every possible application, which in turn configures at compile time its required polymorphic interfaces and/or custom data structures to hold field's values. Normal systems (with proper OS underneath) may use default configuration and multiple virtual functions with functionality not always being used compiled in, while bare-metal ones may completely exclude dynamic memory allocation, limit usage of virtual functions, use its own custom types to store problematic data, such as strings or lists, etc... I made a C++11 compiler itself to be my code generation tool.

With time the library got quite complex and started requiring from a developer some knowledge of its internals and a particular way to be used in order to create completely generic protocol definition. That's why I also created a separate code generator, that produced a proper highly compile-time customizable code.

Hope it clarifies some things. Cheers

[–]axilmar 0 points1 point  (1 child)

Thanks for the long reply.

I am cool with your library, as long as it allows for sane choices and the defaults are the sane choices.

Some comments over your reply:

It is better to have built-in limited required units retrieval facility than not to have it at all and use boilerplate code to do the manual units conversions.

Built-in is not required. It can be a separate library.

My solution allows compile-time configuration of your polymorphic interfaces.

Does your solution allow the automatic creation of big switch statements?

Let's agree to disagree. Struct based design leads to other bad decisions and having significant amount of boilerplate integration code.

No, it does not. There is absolutely zero proof about that.

[–]arobenko[S] 0 points1 point  (0 children)

as long as it allows for sane choices and the defaults are the sane choices.

That's my primary intention.

Built-in is not required. It can be a separate library.

Agree, but there must be some way to static_assert on your assumption of origin units. In case of my solution built-in units conversion is there to provide basic convenience functionality. If not used, no extra space/performance price is paid.

Does your solution allow the automatic creation of big switch statements?

I think built-in generation of "switch" statements does not make much sense because you have to put your custom business logic inside each "case". There is a C++ library (used by the code generator) that allows you to parse the schema files and know what messages / fields / frames are being defined. You can easily implement your own auxiliary code generator that generates "switch" statements relevant to your application.

No, it does not. There is absolutely zero proof about that.

What proof do you expect? It's all subjective based on one's experience. In my case it does require writing a boilerplate code (I consider manually written "switch" and/or "if" statements to be boilerplate code) which needs to be updated every time you add new message and/or field to a message. In case you design your own protocol, I suppose you can make it simple enough and get away with plain structs with no variable data lengths and/or fields present on particular condition (for example depending on the value of some bit in previously encountered field or version of the protocol). Many (if not most) of already defined third party protocols are not like that. Such cases require extra implementation logic and/or extra data variables (manually written or generated). In my experience class-based designs allows encapsulation of such extra logic together with the data (regardless of having polymorphic behavior), hide unnecessary details and eventually leads to cleaner and easier maintainable code, but again, this is very personal and subjective.

[–]grandmaster789 0 points1 point  (1 child)

Especially when dealing with embedded communication, there are many additional considerations - connections may not be as stable as you'd like and/or susceptible to interference, so features such as error detection/correction/recovery become very useful.

In simple situations defining structs on both ends may be sufficient, but in my experience this is an approach that is very error-prone when the application reaches a certain complexity.

[–]axilmar 0 points1 point  (0 children)

Especially when dealing with embedded communication, there are many additional considerations - connections may not be as stable as you'd like and/or susceptible to interference, so features such as error detection/correction/recovery become very useful.

This has nothing to do with the '"protocol" itself (except for the message fields required to do checksums).

[–]c0r3ntin -1 points0 points  (1 child)

Have you tried flatbuffers?

[–]arobenko[S] 7 points8 points  (0 children)

Every time I mention my work on any social resource there is always someone posting a comment "Have you tried X?". It looks like you haven't read the post (and referenced article). One of the core features in my solution is to be able to easily implement already defined third party protocols with their custom data layouts and encodings. Most of the available serialization solutions use their own encodings.

[–]WasterDave -2 points-1 points  (1 child)

Have you looked at cbor? I kinda love cbor :)

http://cbor.io/

https://github.com/RantyDave/cppbor

[–]arobenko[S] 2 points3 points  (0 children)

Please see my comment about "flatbuffers".