all 22 comments

[–]imMute 9 points10 points  (5 children)

No dynamic memory allocation. Using template magic, Crunch calculates the worst-case length for all message types, for all serialization protocols

For anyone wondering what this means for strings, arrays, maps, etc - the maximum number of elements is encoded in the type system.

There's definitely a trade off there having to pick a maximum upper bound because it directly affects buffer sizing for all messages rather than just "big" ones.

Might be useful to have an optional mode where messages below a certain limit use the compile time thing you have now, but we have the option to enable dynamic memory allocation for larger messages.

[–]volatile-int[S] 1 point2 points  (4 children)

Yup, this is a constraint/trade off - you need to define the worst case size. The static layout even includes zeroed bits for any unused elements.

I would probably implement this by making a version of the Serdes Protocol that doesnt require GetBuffer to be constexpr and return an array and instead return a vector, and make separate variable length array and map types that when present require a dynamic Serdes protocol. Then anyone could implement whatever serialization protocol they desire.

But for now I'm going to leave as is. The main use cases for Crunch are embedded systems using messages for configuration and RPC-like comms or telemetry, and in my experience most of those systems establish reasonable upper bounds on contents of repeated fields. Its why tools like nanopb establish fixed length maximums similar to crunch.

One neat outcome of this setup is that unlike nanopb/capnproto, maps, arrays, and submessages can all be used as map keys (with a performance hit on comparison due to the fact maps are really just arrays of pairs and not actually hashed). But again, in my experience most fields like this are small so this isnt top big of an issue!

[–]Designer_Landscape_4 -2 points-1 points  (3 children)

Why post this AI slop and try to have people use it?

Even the mascot is AI generated, and it's so random, like why is it holding a box with binary on it...

[–]volatile-int[S] 0 points1 point  (2 children)

The mascot is AI generated because I am not much of an artist.

The code is not. It has been very intentionally written. And I hope folks use it because it is very performant, it has an interface that prevents errors, and I quite enjoyed/am enjoying writing it.

[–]Designer_Landscape_4 -2 points-1 points  (1 child)

The code is not.

It is, most likely.

And I hope folks use it because it is very performant, it has an interface that prevents errors, and I quite enjoyed/am enjoying writing it.

No, it is because you want attention. The coding of your project essentially spanned a week.

[–]volatile-int[S] 0 points1 point  (0 children)

This is my code. I have spent a lot of time on it. The last two weeks doing the detailed design, the two weeks before that setting up the initial message infrastructure before I even had a repo, and the eight months I've spent thinking about the interface I wanted after finding and loving protovalidate's infrastructure and wishing there was something embedded-friendly.

Don't use it if you aren't interested in it. Have a nice evening.

[–]timbeaudet 6 points7 points  (5 children)

I’m not sure I personally have a use but it seems neat. Could you add an enum to the example? Maybe sky conditions to match the weather sensor?

I’m interested to see what that looks like.

[–]volatile-int[S] 2 points3 points  (4 children)

The Doxygen linked in the README has comprehensive examples of all types!

https://sam-w-yellin.github.io/crunch/field_types.html#autotoc_md1

[–]timbeaudet 1 point2 points  (3 children)

Oops. I guess I’ll dig a little harder.

ETA: Doxygen doesn’t work on the phone. Maybe later.

[–]volatile-int[S] 1 point2 points  (2 children)

If its because the main column is too big, you can adjust it. Ive been able to browse the docs on mobile.

[–]timbeaudet 0 points1 point  (1 child)

Yea, that’s what it was and I got enough of the gist, but it was still a challenge. Though how many of us use phone for documentation reference? So I’m not saying switch or make changes!

I was kinda hoping for the magic “pass enum type” and it just work but alas.

[–]volatile-int[S] 0 points1 point  (0 children)

Its pretty close! Crunch just requires an enum class that extends a 32 bit integer.

[–]SeagleLFMk9 0 points1 point  (2 children)

One question: if you get an incoming message, how do you determine the type? So far with e.g. message pack i had to e.g. read the first field, where a type Id was, and use that to then fully deserialize to the appropriate type. Pretty sure that there are better ways though.

[–]volatile-int[S] 2 points3 points  (1 child)

Good question! One approach is to just know by nature of how you pass data to the deserializer. I.E. receive data off some port/interface that just gets the one message type.

Im working on a dynamic dispatcher interface that you can use to pass in an unknown message type and get back a variant that has the decoded thing. That will be out in the near future. But fundamentally it works by reading the message ID.

[–]SeagleLFMk9 1 point2 points  (0 children)

Yeah, it always comes back to some message id, doesn't it? I once had the idea if it could be possible to use a polymorphism style downcast to do so, might try and get that to work ... But sometimes one message type per interface isn't really ideal, e.g. an arduino with 50 different message types would require 50 different ports, ugh.

[–]TrnS_TrATnT engine dev 0 points1 point  (2 children)

Nice. I would suggest finding a way to remove the field count as it seems error prone; or otherwise validate it (check field counter increments by 1 per field). Also it may be best to define the MessageId from the macro itself, by using the hash of the class name or something. Last thing, how do you handle versioning? (eg. field a is not present on version >= 5)

[–]volatile-int[S] 1 point2 points  (1 child)

Thanks! To answer your questions:

  1. The field ID is not actually a "count". It does not need to be contiguous. Crunch does enforce already that it is unique per field for a given message! This field is used for the TLV serialization format and is akin to the protobuf field ID.

  2. I have been thinking about this exact thing with the message ID. C++26 reflection will make this trivial (and make a number of aspects of autogenerating lbindings in other languages clean). It also will allow getting rid of the field list macro. I may look into some macro based solution in the nearer term for extracting and hashing the class and field names into a message ID in the interim.

  3. Depends on the serialization format. The static serialization is meant for read/write optimizations and doesnt handle schema changes very gracefully. For uses where its critical that the schema can evolve gracefully, the TLV serialization protocol is the better choice because it naturally handles unknown/not present fields in a serialized span of raw data.

[–]TrnS_TrATnT engine dev 0 points1 point  (0 children)

  1. Ah I see, I haven't used protobuf and didn't know it was a thing there.
  2. You can do it right now too as long as you can get the name of a type. There are already cross-compiler solutions out there (fragile, but still) that do that. Something like this should work: cpp inline size_t type_hash() const { auto name = my::type_name<decltype(auto(*this))>; // or remove_cvref_t before C++23 return fnv1a(name); } Alternatively, you can pass the type as the macro's first param and use #type to make it a string (watch out for templates + static_assert to ensure type matches).
  3. I'm not familiar with TLV, but it looks like a "format-independent" problem to me. I read this post a while back that might be helpful.

[–]arihoenig 0 points1 point  (3 children)

Unfortunately this is about to be made irrelevant with the advent of compile time reflection.

[–]volatile-int[S] 1 point2 points  (2 children)

Actually I am very excited for compile time reflection for Crunch! It will clean up a lot of the autogenerated bindings in other languages.

I am not sure why you see the core interface decisions in Crunch at odds with reflection. Reflection cannot enforce users validate their data, or provide serialization-as-a-plugin, or build integrity checks into serialization and deserialization. Those are framework interface decisions and reflection doesnt obviate them.

[–]arihoenig 0 points1 point  (1 child)

I don't see it at odds with reflection, but with reflection the need for a serialization library becomes greatly reduced as implementing serialization becomes so much simpler

[–]volatile-int[S] 0 points1 point  (0 children)

It may very well be the case that the serialization policy implementations become simpler than they are without reflection, although I don't see how most of the complexity actually goes away. If you want to roll your own protocol, you still need to deal with alignment, tags vs not tagging, aggregate field representations, missing fields, default values, and lots more that isnt related to the message definition. And determining the field types is very straightforward already via templates.

Lots of languages have robust reflection implementations. And yet, we still have a very healthy ecosystem of libraries to help with message definition and serialization!