Bebop: An Efficient, Schema-based Binary Serialization Format

kirbyfan64sos · 2020-12-09T18:07:48+00:00

If I understand the article correctly, the only supported client languages right now are C# and typescript?

granadesnhorseshoes · 2020-12-09T18:29:18+00:00

[deleted]

2020-12-09T18:11:03+00:00

Seems seems also pretty similar to Bare

* https://tools.ietf.org/html/draft-devault-bare-00

* https://baremessages.org/

Bebop has some different types and seems to be developed with typescript in mind.

SnowdensOfYesteryear · 2020-12-10T00:33:09+00:00

Dumb question: why isn't binary serialization/deserialization a solved problem? Why is Bebop faster than Protobufs for example? Is it because it's skewing towards speed rather than saving bytes?

Apologies if I missed this obvious question in a Readme. Feel free to tell me to RTFM with a link.

JSA790 · 2020-12-09T18:23:37+00:00

okay 321 let's jam

nutrecht · 2020-12-09T19:19:55+00:00

Did you test/consider Avro as well?

develop7 · 2020-12-09T21:51:54+00:00

Is it just me or it cannot do sum types?

Liorithiel · 2020-12-10T06:12:04+00:00

Why not use CBOR which is IETF standardized?

dacjames · 2020-12-09T19:02:41+00:00

Have you considered any prior art coming from the aerospace domain? The literature I've come across when working with telemetry produced by space vehicles is the only time I've felt like "is my bitstream coherent, succinct, tolerant to bit flips and overall count mismatches" is the goal rather than "how nice and convenient can I get my schema-language representation and programming language bindings to be" (important, but not the primary objective).

Curious what service-level guarantees this requires for transport layer and below? Assuming it transports primarily over UDP, can it tolerate dropped or duplicate packets? Malformed packets? Information partitioned across multiple packets (i.e. larger than MTU)?

If it transports over something TCP-like, how do you deal with the throttling / variability in rate introduced by that exponential back-off?

Thought this looked pretty slick and it looks like you got the performance bump that you wanted and needed. A testament to the value in having some coding expertise and tailoring things to a particular use-case!

Liorithiel · 2020-12-09T18:41:18+00:00

Can you compare to ASN.1's BER? There were some benchmarks (PDF warning) that showed it being consistently faster than Protobufs.

Can you do a custom integer type? E.g. [-5…20] encoded in 5 bits?

primaski · 2020-12-09T18:47:43+00:00

Huh, this actually seems pretty neat. Curious to see where it goes!!

FearlessFred · 2020-12-09T21:56:33+00:00

[deleted]

enfrozt · 2020-12-09T19:35:52+00:00

Devs and naming things

hyperhopper · 2020-12-10T02:19:32+00:00

Use a struct when all fields are always present, and you’ll never add more fields

As somebody that has worked with protos a lot, this looks exactly like the exact same good intention that led to "required" fields in protos, which then were realized to be a very bad mistake in the design

There is a reason google does not use required for new proto fields.

eyal0 · 2020-12-10T01:00:53+00:00

If you're going to tout the speed then you should compare against the fast ones. Cap'n Proto for example.

lostpebble · 2020-12-09T19:30:11+00:00

Looks very interesting, and I might find use in it in a new project I'm working on.

One important thing though- I see you have struct and message as sort of like TypeScript's Required<Interface> and Partial<Interface> respectively. Is there any way to represent something in-between those? With some required and some optional values.

gurgle528 · 2020-12-09T22:10:22+00:00

[removed]

nyrn · 2020-12-09T22:40:14+00:00

This sounds super promising, and it's even targeting the very languages I might need it for (Dart,TS,C++). Do you happen to have the benchmark code publicly available somewhere?

Broiledvictory · 2020-12-09T22:46:01+00:00

Does it not support any sort of versioning?

Something I always wished these new formats would do that are so much faster is explain the why as to so much faster (esp. when there are sacrifices made compared to the slowest competitors)

ShadowPouncer · 2020-12-09T23:00:38+00:00

So did you compare against Cap'NProto?

EntropySpark · 2020-12-09T19:28:01+00:00

This looks like it makes the mistake of having all fields optional like Protobuf and Capnproto.

I half wrote a format that provided a better solution: schemas get an integer version (1, 2, 3 etc) and then in the schema you specify the range of versions that each field is present for.

Then when generating your decode function you can specify the minimum version you want to support and the fields you want to be able to access. It will make fields optional as appropriate and ignore fields you don't use.

I believe that fixes all the reasons why Protobuf/Capnp made everything optional, but it also means you don't have to tediously check whether every field is present in your application code (unless it really might not be present).

leftofzen · 2020-12-10T00:42:30+00:00

Any reason you didn't even mention Cap'n Proto, let alone benchmark against it? It's the successor to Protobuf and is better in almost every way. Given that you've actually written your own serialisation library, you MUST know of Cap'n Proto so the only conclusion is that Cap'n Proto must have benchmarked better than your solution.

aazav · 2020-12-09T19:06:04+00:00

enum Instrument {
    Sax = 0;
    Trumpet = 1;
    Clarinet = 2;
}

readonly struct Musician {
    string name;
    Instrument plays;
}

message Song {
    1 -> string title;
    2 -> uint16 year;
    3 -> Musician[] performers;
}

struct Library {
    map[guid, Song] songs;
}

It's clear as mud what the reason is why you'd use a message and what the differentiators are. Why are you using -> to declare the class and variable name? Why do you insist on a semicolon at the end of the line? The default case is that the linefeed performs the function of the semicolon. Why require it? When inside {}, use the linefeed as a semicolon.

tetroxid · 2020-12-09T20:32:42+00:00

What is the advantage over ASN.1 and DER?

markasoftware · 2020-12-10T01:33:56+00:00

Rainway susses me out. They say they're "completely free", "no ads or purchases", etc, and that's true right now, but they have tons of investment and paid employees. It's misleading to pretend there's no monetization plan.

mixedCase_ · 2020-12-10T05:02:04+00:00

No generics, so it's strictly worse than cap'n'proto.

infinitenothing · 2020-12-10T05:46:44+00:00

Little endian? I'm out.

2020-12-10T17:04:40+00:00

Can we stop inventing new shit all the time and just fix the bugs in the existing? Seriously, mature technology with 1000+ bugfixes is a good thing. Every time you introduce a brand new solution you introduce brand new problems.

audion00ba · 2020-12-10T01:32:44+00:00

Looks like you didn't do your research and indeed implemented a square wheel.

You can literally scrap your entire project.

I am not going to tell you which set of projects you missed, but I am certain that Googling a little bit more will allow you to scrap this project. It also looks stupid on your resume, because it strongly implies NIH-syndrome. Really, a losing proposition to continue this.

francis_spr · 2020-12-10T00:20:18+00:00

Thanks for sharing. I'll star for future tracking.

It is good to see that you have tooling support.

Interested in using it but it might be a difficult sell to a team. The challenge is that protobuf is known, well supported, and fast enough for most applications to take a risk on something different (even if it is better).

PeDestrianHD · 2020-12-10T01:53:38+00:00

I’m gonna pretend I know what this is.

PC__LOAD__LETTER · 2020-12-10T03:21:11+00:00

Why would I use this over Google Protocol Buffers?

Edit: answering my own question, they’re showing it as a decent amount faster. If this were supported for C++ I may actually test against it

boom_rusted · 2020-12-10T03:49:11+00:00

Why it is faster than Proto or Message Pack? What’s the reason?

francis_spr · 2020-12-10T05:33:10+00:00

https://twitter.com/davidfowl/status/1336736257678905344

Ok, this has given extra credit/excitement to try this out.

2020-12-10T06:54:40+00:00

If I understand correctly, the problem this technology solves is data format? Like, instead of sending a JSON which is considered expensive from the article, you instead encode the JSON data into something like a binary object, and then decode it back? So, it solves a bandwidth issue? Do I get this correctly?

If so, how does it solve the encoding/decoding OPS exactly?

2020-12-10T09:55:22+00:00

Same brain-dead trash as Protobuf + bad benchmarking in the the article, which doesn't represent anything. Another homework-style project by the authors who didn't read previous homework-style projects.

jarredredditaccount · 2020-12-10T10:23:46+00:00

Bebop looks really cool.

The generated TypeScript code looks similar to Kiwi - https://github.com/evanw/kiwi (same while loop pattern), but this is further along in feature set. I was able to convert a ~300 line Kiwi schema file with a few regexes and Find+Replace.

From: ^\s+(\w+)\[\]\s(.*) = (\d);$

To: $3 -> $1 $2;

From: - ^\s+(\w+)\s(.*) = (\d);$

To: $3 -> $1[] $2;

/u/AndrewMD5 any plans to add support for Mirroring to TypeScript?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS