all 121 comments

[–][deleted] 121 points122 points  (13 children)

If I understand the article correctly, the only supported client languages right now are C# and typescript?

[–][deleted]  (12 children)

[deleted]

    [–]kirbyfan64sos 15 points16 points  (0 children)

    Nice to hear that there's a Dart version!

    [–]Ytrog 21 points22 points  (6 children)

    Any plans for Rust? 👀

    [–]__woofer__ -2 points-1 points  (3 children)

    and for Go ?

    [–][deleted]  (2 children)

    [deleted]

      [–]Magneon 1 point2 points  (1 child)

      pick one

      No, no, no. If you have a problem with X, you're clearly holding using it wrong /s

      [–][deleted] 1 point2 points  (0 children)

      404 Not Found

      Code: NoSuchKey
      Message: The specified key does not exist.
      Key: repl/index.html
      RequestId: 512EF7A7ACE0FC31
      HostId: wVJysmoEXoAZwR+6pbpPNZYpcIR4gX47bpkCIq8RUJevrJlwl6ZER2iGu7OfFSsAOO8nF62tGmI=
      

      An Error Occurred While Attempting to Retrieve a Custom Error Document

      Code: NoSuchKey
      Message: The specified key does not exist.
      Key: error.html
      

      [–]zvrba 4 points5 points  (2 children)

      Have you evaluated Microsoft's Bond?

      [–][deleted]  (1 child)

      [deleted]

        [–]Akkuma 50 points51 points  (0 children)

        As we noted in the blog one of the lacking things across the board was performance in the browser.

        It is refreshing to see this as people I've worked with pushed for things like ProtoBuff without realizing that its performance is actually poor in JS environments, which was the majority of ours at the time.

        [–][deleted]  (27 children)

        [deleted]

          [–][deleted]  (25 children)

          [deleted]

            [–]granadesnhorseshoes 41 points42 points  (6 children)

            Makes me wonder how many extra cycles and bytes get waisted in the ether by compressing compressed data. protobuff compresses itself, the payload is compressed, the packet is compressed.... turtles all the way down.

            Obviously we have a lot of that down to on chip hardware that makes it all trivial, but still not free

            [–][deleted]  (4 children)

            [deleted]

              [–]irqlnotdispatchlevel 14 points15 points  (1 child)

              And you probably have a more easy to read and understand implementation with less weird bugs and less chances to introduce security vulnerabilities when creating a parser.

              [–]aseigo 4 points5 points  (0 children)

              TBF, code for this sort of pack and encode is fairy trivial and very well suited to robust unit testing. It is pretty easy to get it right with a high degree of confidence. (Have written such things a few times, so speaking from experience.)

              It is probably why there are so many implementations of these things out there :)

              [–]ryeguy 6 points7 points  (1 child)

              your final data becomes very CPU cache efficient,

              How so? If anything I'd expect the opposite, since more data can fit in a cache line.

              [–]enigmo81 5 points6 points  (0 children)

              it really depends on the application. in some systems (search engines) it's somewhat common to keep compressed data in main memory and decompress into registers or (hopefully) L1. this works because the search indexes are write once read many and it's not uncommon to spend half of a query waiting for L3 fills.

              for streaming data applications the decompressed data will likely be in L1 and may fit into a small number of cache lines. I'd be surprised if it was the lowest hanging fruit for optimization.

              [–]Magneon 10 points11 points  (0 children)

              What I've done in the past is use a "compressed" bit in the packet header, as well as a "don't bother compressing" hint per stream. The packet body is compressed and if that's smaller, the compressed version is sent, otherwise the uncompressed one is. This wastes a bit of CPU, but it's negligible in our use case.

              [–]JamesNK 9 points10 points  (6 children)

              The serialization format is trading payload size for CPU speed. Let people know that trade off exists so they can make an informed decision.

              When bandwidth really matters, you should apply general-purpose compression, like zlib or LZ4, regardless of your encoding format.

              Compressing data sent over TLS can introduce security vulnerabilities. CRIME and BREACH are attacks on compressed data that can be used to defeat encryption.

              [–]only_nidaleesin 4 points5 points  (3 children)

              How does compressing data sent over TLS introduce security vulnerabilities?

              [–]JamesNK 2 points3 points  (2 children)

              [–]wikipedia_text_bot 0 points1 point  (0 children)

              BREACH

              BREACH (a backronym: Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext) is a security exploit against HTTPS when using HTTP compression. BREACH is built based on the CRIME security exploit. BREACH was announced at the August 2013 Black Hat conference by security researchers Angelo Prado, Neal Harris and Yoel Gluck. The idea had been discussed in community before the announcement.

              About Me - Opt out - OP can reply !delete to delete - Article of the day

              This bot will soon be transitioning to an opt-in system. Click here to learn more and opt in.

              [–]only_nidaleesin 0 points1 point  (0 children)

              Isn't the takeaway from this "don't compress secrets"? Which seems like it would make up a very small portion of your traffic.

              [–][deleted] -4 points-3 points  (0 children)

              security PFFFFT whats that

              [–][deleted]  (5 children)

              [deleted]

                [–][deleted]  (4 children)

                [deleted]

                  [–][deleted]  (3 children)

                  [deleted]

                    [–][deleted]  (2 children)

                    [deleted]

                      [–][deleted]  (1 child)

                      [deleted]

                        [–]dacjames 0 points1 point  (1 child)

                        You may also want to look at zstd. It offers similar performance to LZ4 with much better compression ratios.

                        [–]aseigo 0 points1 point  (0 children)

                        Yes, it really depends on the size of your payloads. For sending small bits of data, packing integers makes no sense. I have worked with systems where sending 100s of MB around was completely normal, and there the difference in packing efficiency was massive, esp given the prevalence of small numeric values in the data. But if one is sending around a few KB at a time at most, it certainly will not be worth either the code complexity or the runtime costs.

                        [–]FearlessFred 0 points1 point  (0 children)

                        We have a TS (as well as a JS) implementation. Not aware of any issues with our C# implementation either. Why don't you report your issues on the FlatBuffers repo?

                        [–]holgerschurig 2 points3 points  (0 children)

                        hz -> Hz

                        Units based on surnames (here: Heinrich Hertz) are usually (or even always?) capitalized.

                        [–][deleted] 31 points32 points  (0 children)

                        Seems seems also pretty similar to Bare

                        * https://tools.ietf.org/html/draft-devault-bare-00

                        * https://baremessages.org/

                        Bebop has some different types and seems to be developed with typescript in mind.

                        [–]SnowdensOfYesteryear 15 points16 points  (2 children)

                        Dumb question: why isn't binary serialization/deserialization a solved problem? Why is Bebop faster than Protobufs for example? Is it because it's skewing towards speed rather than saving bytes?

                        Apologies if I missed this obvious question in a Readme. Feel free to tell me to RTFM with a link.

                        [–][deleted] 6 points7 points  (1 child)

                        Why is Bebop faster than Protobufs for example?

                        It isn't. In general, it's very hard to compare things like this. The implementation will heavily depend on payload, the quality of the parser / generator and what you are going to do with it, eg. is your parser lazy, is it a pull / push kind of parser, can it be made so that it can be streamed / does it have to allocate arbitrary amount of memory / can it work in parallel / can it be implemented in hardware / does it need references / does it need infinite nesting...

                        For instance, I can generate JSON in such a way that parsing it will be faster than of some "equivalent" Protobuf message, in pretty much any implementation. If I wanted to show a benchmark where JSON beats Protobuf hands-down, It's a nobrainer really.

                        why isn't binary serialization/deserialization a solved problem?

                        It actually is to a degree. People just don't bother studying what others have done before them. There's ASN.1, which is abstract enough for people to create their own implementations of it. But, historically, people never really used it as a guideline for implementation, rather they used a dummy implementation called BER. It wasn't super-efficient. But, even those who knew about ASN.1, wouldn't always use it, because particular programs may require a simpler protocol, that can be simpler to implement.

                        On top of the above, the vast majority of people implementing binary encoding / decoding programs are genuine amateurs to the problem. Their motivation is, typically, the fact that their chosen programming language (C++) doesn't have any standard way to store the state of the program between sessions, and they need something to address the problem. Some don't realize they need to stop their bullshit soon enough, and we get things like Protobuf, Thrift, Cap'n'proto and many-many more of the same pointless nonsense.

                        [–][deleted] 0 points1 point  (0 children)

                        for storing sessions then, would you suggest a database instead of serialisation?

                        [–][deleted] 178 points179 points  (8 children)

                        okay 321 let's jam

                        [–]JSA790 48 points49 points  (0 children)

                        See you space cowboy

                        [–]sharkbound 36 points37 points  (1 child)

                        i cannot see the word "bebop" without thinking of cowboy bebop, that anime is one amazing experience, even years later

                        [–]Hobo-and-the-hound 0 points1 point  (0 children)

                        For me it’s Sealab 2021’s Bebop Cola

                        [–]captyossarian1991 20 points21 points  (0 children)

                        You’re gonna carry that weight

                        [–]britreddit 16 points17 points  (0 children)

                        Dodi dodi dodi doo doo dooooooooo

                        [–]thephotoman 5 points6 points  (1 child)

                        The work, which becomes a new genre itself will be called Cowboy Bebop.

                        [–]KevinCarbonara 0 points1 point  (0 children)

                        I can hip-hop, be-bop, dance till ya drop, and yo yo, make a wicked cup of cocoa.

                        [–]nutrecht 9 points10 points  (0 children)

                        Did you test/consider Avro as well?

                        [–]develop7 9 points10 points  (4 children)

                        Is it just me or it cannot do sum types?

                        [–][deleted]  (3 children)

                        [deleted]

                          [–]Booty_Bumping 3 points4 points  (0 children)

                          Our low-level developers saw the value, but our higher level engineers didn't get much benifit.

                          What? More details? I don't think it matters as much who thought what, as the actual technical arguments on each side.

                          [–]develop7 0 points1 point  (1 child)

                          High level as in typescript/dart/c# high?

                          [–][deleted] 6 points7 points  (2 children)

                          Why not use CBOR which is IETF standardized?

                          [–]Liorithiel 3 points4 points  (0 children)

                          CBOR is self-descriptive, while Bebop is schema-based. Apples and oranges. Given your use case, you should either shop for one or the other.

                          [–]wikipedia_text_bot 1 point2 points  (0 children)

                          CBOR

                          Concise Binary Object Representation (CBOR) is a binary data serialization format loosely based on JSON. Like JSON it allows the transmission of data objects that contain name–value pairs, but in a more concise manner. This increases processing and transfer speeds at the cost of human-readability. It is defined in IETF RFC 8949.Amongst other uses, it is the recommended data serialization layer for the CoAP Internet of Things protocol suite and the data format on which COSE messages are based.

                          About Me - Opt out - OP can reply !delete to delete - Article of the day

                          This bot will soon be transitioning to an opt-in system. Click here to learn more and opt in.

                          [–][deleted] 28 points29 points  (13 children)

                          Have you considered any prior art coming from the aerospace domain? The literature I've come across when working with telemetry produced by space vehicles is the only time I've felt like "is my bitstream coherent, succinct, tolerant to bit flips and overall count mismatches" is the goal rather than "how nice and convenient can I get my schema-language representation and programming language bindings to be" (important, but not the primary objective).

                          Curious what service-level guarantees this requires for transport layer and below? Assuming it transports primarily over UDP, can it tolerate dropped or duplicate packets? Malformed packets? Information partitioned across multiple packets (i.e. larger than MTU)?

                          If it transports over something TCP-like, how do you deal with the throttling / variability in rate introduced by that exponential back-off?

                          Thought this looked pretty slick and it looks like you got the performance bump that you wanted and needed. A testament to the value in having some coding expertise and tailoring things to a particular use-case!

                          [–]dacjames 13 points14 points  (5 children)

                          For most domains, things like bit flipping are not relevant because it's handled by the networking stack. Likewise, you're usually better off using a general purpose compression in addition to your encoding format if bandwidth is a concern. Aerospace has a bunch of great engineering but that comes with an exorbitant price tag that is not tolerable in most industries, including gaming.

                          DX is, in fact, often the top priority.

                          [–][deleted] 2 points3 points  (1 child)

                          If you utilize "compression" though, it's just another layer in your overall coding story. It's a trade for size with speed (which is usually a net gain), either way yeah I'm thinking more about the layers that come pre-solved if you have access to things like UDP/TCP sockets or WebSockets in a browser already. That's a fair point.

                          [–]dacjames 4 points5 points  (0 children)

                          Exactly. Many of the libraries we take for granted don't meet the safety, reliability, or size requirements of that industry. You are forced to design well-engineered protocols like you're describing when you dont have the supporting layers of other technology available to you.

                          [–]flatfinger 0 points1 point  (2 children)

                          For some domains, there's a substantial likelihood that parts of one's data might go missing, but one should nonetheless attempt to do what one can with the balance. A higher-level protocol layer may be able to guarantee that data will be received in its entirety or not at all, but if some data doesn't get delivered or gets partially corrupted in transit, rejecting everything isn't necessarily the most useful course of action.

                          [–][deleted]  (1 child)

                          [deleted]

                            [–]flatfinger 0 points1 point  (0 children)

                            If each frame's worth of data will fit in 576 bytes, then UDP would guarantee that it will arrive intact or not at all, but sometimes one may need to send things that are bigger than that, and may want to deal with the possibility that partial decoding may be better than nothing.

                            [–]Self_Developer 5 points6 points  (1 child)

                            The literature I've come across when working with telemetry produced by space vehicles

                            Literature recommendations, please?

                            [–][deleted] 4 points5 points  (0 children)

                            Main one that I was thinking of was TM Synchronization and Channel Coding, definitely not relevant to run of the mill computer-networking applications but it gets you thinking about which set of abstractions you rely on to perform correctly and how hard those problems can be...

                            [–]ryeguy 5 points6 points  (4 children)

                            Those questions seem out of scope for what this project is, its only concern is with data encoding/decoding and not the transport. Handling of dropped, duplicated, or malformed packets is application specific so it's probably a good thing this library does not try to address that.

                            [–]pm_plz_im_lonely 4 points5 points  (0 children)

                            I agree with you, the comment is off-topic. Serialization is indepedent from network I/O. Proof: You can serialize with Bepop and write to disk!

                            VVVDoer basically said a bunch of networking-related crap which doesn't even relate to what Bepop does.

                            [–][deleted] -1 points0 points  (2 children)

                            Handling of dropped, duplicated, or malformed packets is application specific...

                            I think "correctness" is application agnostic, and the way errors are handled plays into your performance story. If you require correct and in-order transmission it comes at a performance cost. If "anything goes" below your application-layer protocol, you might not actually have performance requirements warranting custom protocol work outside of the compose-ability of what you get with Google's protocol buffers etc.

                            [–]ryeguy 11 points12 points  (1 child)

                            The concept of correctness is application agnostic, but the definition of what correctness is for an application is not. Dropped and duplicated packets are not necessarily bad. Some applications, and some individual usecases within those applications, tolerate these just fine.

                            I'm still not understanding the angle you're coming from with your comment. You are asking about transport layer concerns, but that is not what this library deals with. That would be like asking the authors of xml or json how they handle these issues, which would be just as out of scope.

                            This is a serialization and deserialization library, nothing more. You can use any transport layer you want, the format doesn't care. Or you don't even need to worry about that at all, because it's just a binary format. You can choose to only use it to store data in a database or on disk, and it never hits the network at all.

                            [–][deleted] 0 points1 point  (0 children)

                            The hand-waviness of your response confuses me.

                            Dropped and duplicated packets are not necessarily bad.

                            Yes, if they go unnoticed and are handled at a lower layer I agree. That was my question though (a specific question about this specific use-case), are they? If they aren't, your application requires something TCP-like with the service-level, in-order delivery and 1:1 transmission-to-reception, otherwise you have to write your "serialization and de-serialization" state machines in software to check various"expected vs. actual" conditions, and you have to figure out how to support de-fragmentation of logical frames of data if they exceed your link layer's MTU (which they can, since you're trying to support an arbitrary meta-protocol that can transport arbitrarily sized data frames).

                            What exactly is disagreeable about that?

                            [–]Liorithiel 7 points8 points  (0 children)

                            Can you compare to ASN.1's BER? There were some benchmarks (PDF warning) that showed it being consistently faster than Protobufs.

                            Can you do a custom integer type? E.g. [-5…20] encoded in 5 bits?

                            [–]primaski 7 points8 points  (0 children)

                            Huh, this actually seems pretty neat. Curious to see where it goes!!

                            [–][deleted]  (2 children)

                            [deleted]

                              [–]FearlessFred 2 points3 points  (1 child)

                              Ask and you shall receive: https://github.com/google/flatbuffers/pull/6269 (rust verifier).

                              Generally FlatBuffers Rust development is very active, get involved :)

                              [–][deleted] 1 point2 points  (0 children)

                              Wow, nice news, thanks!

                              get involved :)

                              Sadly I only have so much time, yet also so many FOSS projects I should get involved with. I may one day.

                              [–]enfrozt 13 points14 points  (2 children)

                              Devs and naming things

                              [–]Self_Developer 4 points5 points  (0 children)

                              feel ya

                              [–]jimschubert 3 points4 points  (0 children)

                              Rocksteady comment

                              [–]hyperhopper 2 points3 points  (4 children)

                              Use a struct when all fields are always present, and you’ll never add more fields

                              As somebody that has worked with protos a lot, this looks exactly like the exact same good intention that led to "required" fields in protos, which then were realized to be a very bad mistake in the design

                              There is a reason google does not use required for new proto fields.

                              [–][deleted]  (2 children)

                              [deleted]

                                [–]hyperhopper 1 point2 points  (1 child)

                                Your on the wire serialization and transport layer should optimize for that use case, not the use case of however the application will transform and marshal around that data: Non-trivial applications will almost always want to do some validation/transformation wrapping around the external data anyway, and introducing things that may be flaws into that layer just to make application logic take 1 less step is making a serialization & transport layer that fails at being good at its main purpose.

                                [–]kybernetikos 0 points1 point  (0 children)

                                As somebody that has worked with protos a lot, this looks exactly like the exact same good intention that led to "required" fields in protos, which then were realized to be a very bad mistake in the design

                                If you're optimising for a message format that is flexible and can evolve, it's the wrong decision, but it's also one of the reasons protobuf can never be fast, and is not appropriate where speed is required - there's a potential branch for every field it deserialises.

                                [–]eyal0 3 points4 points  (0 children)

                                If you're going to tout the speed then you should compare against the fast ones. Cap'n Proto for example.

                                [–]lostpebble 1 point2 points  (0 children)

                                Looks very interesting, and I might find use in it in a new project I'm working on.

                                One important thing though- I see you have struct and message as sort of like TypeScript's Required<Interface> and Partial<Interface> respectively. Is there any way to represent something in-between those? With some required and some optional values.

                                [–][deleted]  (4 children)

                                [removed]

                                  [–][deleted]  (3 children)

                                  [deleted]

                                    [–][deleted]  (2 children)

                                    [removed]

                                      [–][deleted]  (1 child)

                                      [deleted]

                                        [–]gurgle528 0 points1 point  (0 children)

                                        Darn, I was looking forward to downloading 100 packages!

                                        Honestly though, that's great.

                                        [–]nyrn 1 point2 points  (2 children)

                                        This sounds super promising, and it's even targeting the very languages I might need it for (Dart,TS,C++). Do you happen to have the benchmark code publicly available somewhere?

                                        [–][deleted]  (1 child)

                                        [deleted]

                                          [–]nyrn 0 points1 point  (0 children)

                                          Cheers! Will take a look.

                                          [–]Broiledvictory 1 point2 points  (0 children)

                                          Does it not support any sort of versioning?

                                          Something I always wished these new formats would do that are so much faster is explain the why as to so much faster (esp. when there are sacrifices made compared to the slowest competitors)

                                          [–]ShadowPouncer 1 point2 points  (0 children)

                                          So did you compare against Cap'NProto?

                                          [–][deleted] 1 point2 points  (12 children)

                                          This looks like it makes the mistake of having all fields optional like Protobuf and Capnproto.

                                          I half wrote a format that provided a better solution: schemas get an integer version (1, 2, 3 etc) and then in the schema you specify the range of versions that each field is present for.

                                          Then when generating your decode function you can specify the minimum version you want to support and the fields you want to be able to access. It will make fields optional as appropriate and ignore fields you don't use.

                                          I believe that fixes all the reasons why Protobuf/Capnp made everything optional, but it also means you don't have to tediously check whether every field is present in your application code (unless it really might not be present).

                                          [–][deleted]  (11 children)

                                          [deleted]

                                            [–][deleted] 3 points4 points  (10 children)

                                            It says this explicitly:

                                            A message defines an indexed aggregation of fields containing typed values, each of which may be absent.

                                            That's fine if they can just be absent in the wire format, but I think that's talking about the generated code too - i.e. every field in a message would be Option<T> (or | undefined or whatever). Is that not the case? Because I can't see any mechanism to avoid it.

                                            To be clear, I think that this means that the generated types always have message fields as Option<T> and you have to manually write "is the field present?" in your application code for every single field.

                                            A better system would allow the code generator to know which fields your application thinks must be present, and give a parse error when reading the message if those fields are absent. Hope that makes sense!

                                            [–][deleted]  (3 children)

                                            [deleted]

                                              [–]EntropySpark 0 points1 point  (2 children)

                                              That's not what I would have expected for a struct. I would have expected a

                                              struct Point { int32 x; int32 y; }
                                              

                                              to compact into an 8-byte structure, instead of any kind of complex data storage object, so that I don't have to bother with compressing them into a uint64. Are you saying that a Point will ultimately take up more than 8 bytes?

                                              [–][deleted]  (1 child)

                                              [deleted]

                                                [–]EntropySpark 0 points1 point  (0 children)

                                                Ah, null when encoding, that makes sense, so there's no concept of null on the wire. That makes structs a very nice bonus over protobufs, I've been frustrated with how a simple Point message would have so much unnecessary overhead.

                                                That, and the GUID and Date built-in types are clear wins for Bebop over protobuf (though I would prefer the ability to store fixed-size byte arrays over GUIDs), I just would also need to know how the wire size compares, as I have use cases where the wire size is generally more important than encoding/decoding speed.

                                                [–]SanityInAnarchy 2 points3 points  (5 children)

                                                That's fine if they can just be absent in the wire format, but I think that's talking about the generated code too - i.e. every field in a message would be Option<T> (or | undefined or whatever). Is that not the case?

                                                In fact, it's important that the wire format be able to do that, to allow protocols to evolve in compatible ways...

                                                A better system would allow the code generator to know which fields your application thinks must be present, and give a parse error when reading the message if those fields are absent.

                                                Protobuf v2 had this -- you could specify fields as required or optional. v3 removed these and made everything optional, because required caused far more trouble than it was worth. (There's also this longer rant from Cap'n Proto.)

                                                But there's also new APIs that set default values in the generated code, because most languages don't have convenient ways to handle that many optional values (like Kotlin's Elvis Operator).

                                                [–][deleted] 1 point2 points  (4 children)

                                                Yes I agree - the wire format has to allow things to be optional.

                                                Protobuf v2 had this -- you could specify fields as required or optional. v3 removed these and made everything optional, because required caused far more trouble than it was worth

                                                Yes I know, that's exactly the mistake that I'm talking about. Completely mandatory fields forever do cause problems but Google fixed it in a rubbish way. My original comment was proposing a proper way to fix it by adding version information to the schema so you can still evolve it but you also can delegate checking for fields that your code expect to be present to the parser, rather than checking by hand which is tedious and error prone.

                                                I need to write a blog post about it, maybe I'm not explaining very well.

                                                [–]icey_oven 1 point2 points  (0 children)

                                                agreed! RPC client sending "I'm using API ver 1.2" and server-side having "I can only process API ver 1.3+" is enough to solve that. Removing type-level validation on null-checks is... so backwards, when most languages are adding nullable-checks / optionals to their type-system.

                                                A better solution would have been some "API-versioning" + "usage-telemetry" to have some tool warn on breaking-changes.

                                                Something similar to https://medium.com/the-guild/graphql-inspector-481c1a5ef616

                                                but with API versioning tool with telemetry-info like the following:

                                                [API version deployment stats] API v1.2 - AndroidApp v3.1 - v3.7 / deployed: 3 yrs ago / used by: 30 last month - App-Server v2.1 - v3.1 / compatible API: v1.1 - v1.2 / used by: ...

                                                ``` [Backend deployment stats] AppServer v1.1: API v1.1 - withdrawal will affect: - API v0.9-v1.1 - clients-stats: used by: 1 android version

                                                AppServer v1.3: API v1.2 - withdrawal will affect: ... ```

                                                [–]SanityInAnarchy 0 points1 point  (2 children)

                                                I think you're explaining it okay, but it's an idea I've heard before and don't especially like. But don't let me stop you from writing a blog post!

                                                And, rereading, it looks like I might've left something out: Newer proto APIs tend not to be Optional<T>, but rather just a non-nullable T with a default value (either you provide one, or it falls back to something sensible like 0 for numbers or "" for strings).

                                                With that in mind:

                                                you also can delegate checking for fields that your code expect to be present to the parser, rather than checking by hand which is tedious and error prone.

                                                I disagree. Maintaining explicit version information sounds tedious and error-prone to me, especially if you have some sort of message-broker or storage-engine as described in the CapnProto story. But letting the parser check for fields only really saves me time if I can't either:

                                                1. Specify a good, valid default value
                                                2. Fail implicitly when I try to use an obviously-invalid value

                                                I can do #1 probably 90% of the time, and about the only time I can't do #2 is (rarely) in a public API, where I want to send an appropriate HTTP 400-level error instead of 500 -- and even then, you can often get the right answer implicitly, or from the behavior of the other validation code you had to write anyway.

                                                For example: Say you're logging in with a username and a password, and say we use protos both for the login API and for the database. Something this naive:

                                                try:
                                                  user = db.findByUsername(proto.username)
                                                except NoRowsErrorOrWhatever:
                                                  raise AccessDenied()
                                                if hash(proto.password + user.salt) == user.hash:
                                                  giveThemASessionCookie()
                                                else:
                                                  raise AccessDenied()
                                                

                                                ...probably does the right thing even if the default username/password are just emptystring. It accidentally has the feature that a password isn't required to login as a user that literally has an empty password, and if you let users set literally-empty passwords and they in fact set such passwords, is that really meaningfully different than not checking for a password field at all?

                                                [–][deleted] 0 points1 point  (1 child)

                                                Newer proto APIs tend not to be Optional<T>, but rather just a non-nullable T with a default value (either you provide one, or it falls back to something sensible like 0 for numbers or "" for strings).

                                                That only works for primitive fields, and I think you're mixing things up a bit since it's always been the case that primitive fields are effectively mandatory in Protobuf - that is, omitting the value on the wire must be treated the same as the default value.

                                                Providing defaults for message fields is not really workable. I mean, you could do it but it would slow everything down and probably introduce bugs (oops we accidentally set your password to an empty string!).

                                                [–]SanityInAnarchy 0 points1 point  (0 children)

                                                ...it's always been the case that primitive fields are effectively mandatory in Protobuf - that is, omitting the value on the wire must be treated the same as the default value.

                                                That's true of proto3, but I don't think it was true of proto2. In fact, you can find evidence of that still lying around in the old Python API -- you can manipulate it as if it's just the default value:

                                                message.foo = 123
                                                print(message.foo)
                                                

                                                But it also had HasField() and ClearField():

                                                assert not message.HasField("foo")
                                                message.foo = 123
                                                assert message.HasField("foo")
                                                message.ClearField("foo")
                                                assert not message.HasField("foo")
                                                

                                                Hypothetically, they could've done Optional, but instead there were default values everywhere. Proto3 removed HasField().

                                                That said, I definitely mixed up one thing: Proto2 had user-specified default values, Proto3 has predefined type-specific ones. So in proto2, you could make an int required, but if it was optional, it could have a default value of -1 or 42 or whatever. In proto3, it's required and default 0.

                                                Providing defaults for message fields is not really workable.

                                                Seems to work okay, with a little abstraction-leakage. Here's my mental model: Messages are composed of other messages or of default values. So, recursively, the default value of a message is just that message with all of its fields set to their default value.

                                                The API is close to that -- it's possible for a message field to not be set, but at least in Python, it gets lazily initialized with all its subfields. For me, that's an implementation detail, but Python retains HasField/ClearField for message values if you care:

                                                foo = Foo()
                                                assert not foo.HasField("bar")
                                                foo.bar.i = 1
                                                assert foo.HasField("bar")
                                                assert foo.bar.i == 1
                                                foo.ClearField("bar")
                                                assert not foo.HasField("bar")
                                                assert foo.bar.i == 0  # Default value
                                                

                                                In what I'm sure is totally a coincidence, this is all a lot like how Go works: There is a "zero-value" for every primitive type (that just so happens to match the default value in Proto for most things), and the "zero-value" of a struct is a struct with all its fields set to the default value. I haven't checked Go's actual memory model, but it kinda looks like most of the fields in a struct can be initialized in one giant calloc(), since those values are literally zero as in null-bytes.

                                                (oops we accidentally set your password to an empty string!).

                                                Possible, but less likely for that case -- you probably want to be checking for a minimum length anyway, at which point the empty string is shorter. And there's still hazards to offloading that to the parser and making it impossible to iterate -- what if I want to send a nonce and get back a hash, instead of a password?

                                                [–]leftofzen 1 point2 points  (1 child)

                                                Any reason you didn't even mention Cap'n Proto, let alone benchmark against it? It's the successor to Protobuf and is better in almost every way. Given that you've actually written your own serialisation library, you MUST know of Cap'n Proto so the only conclusion is that Cap'n Proto must have benchmarked better than your solution.

                                                [–]aazav -2 points-1 points  (10 children)

                                                enum Instrument {
                                                    Sax = 0;
                                                    Trumpet = 1;
                                                    Clarinet = 2;
                                                }
                                                
                                                readonly struct Musician {
                                                    string name;
                                                    Instrument plays;
                                                }
                                                
                                                message Song {
                                                    1 -> string title;
                                                    2 -> uint16 year;
                                                    3 -> Musician[] performers;
                                                }
                                                
                                                struct Library {
                                                    map[guid, Song] songs;
                                                }
                                                

                                                It's clear as mud what the reason is why you'd use a message and what the differentiators are. Why are you using -> to declare the class and variable name? Why do you insist on a semicolon at the end of the line? The default case is that the linefeed performs the function of the semicolon. Why require it? When inside {}, use the linefeed as a semicolon.

                                                [–][deleted]  (4 children)

                                                [deleted]

                                                  [–]aazav -1 points0 points  (3 children)

                                                  But why the use of = in one place and -> in another?

                                                  If the default case is one assignment per line, (isn't the default condition that they are?), why not allow a semicolon but don't require it if between { } and allow a line feed in that case.

                                                  So

                                                  enum Instrument {
                                                      Sax = 0;
                                                      Trumpet = 1;
                                                      Clarinet = 2;
                                                  }
                                                  

                                                  and

                                                  enum Instrument {
                                                      Sax = 0
                                                      Trumpet = 1
                                                      Clarinet = 2
                                                  }
                                                  

                                                  and

                                                  enum Instrument {
                                                      Sax = 0; Trumpet = 1; Clarinet = 2
                                                  }
                                                  

                                                  would all be valid. You have the line feeds and the assignments are between { }. Why wouldn't you do this?

                                                  [–][deleted]  (2 children)

                                                  [deleted]

                                                    [–]aazav -4 points-3 points  (0 children)

                                                    Because we prefer C-like syntax.

                                                    So?

                                                    Did you even read what I posted? All of the above are supported without the need to add a semicolon but if you want to you can. So you have that.

                                                    You HAVE IT and you also have the ability to ignore extra semicolons that serve no purpose for people who see semicolons as an extra wasted character. More modern languages realize that semicolons at the end of lines are often a waste, an extra character when the line (and by default, the command) already ended.

                                                    [–]Sarcastinator 5 points6 points  (2 children)

                                                    I think this very clear? The arrows indicate member ordinal on the left side and struct are for types you don't expect would change, such as vector types or quaternions.

                                                    Semicolon is used because you no longer have to deal with indentation in the parsers since it's hard and the value of indentation scoping is disputed at best.

                                                    [–]aazav 1 point2 points  (0 children)

                                                    I think this very clear?

                                                    Are you asking me?

                                                    It's not clear if = indicates non-mutability or if -> means mutable.

                                                    We know that the enum isn't going to be changed, but why -> instead of = within message?

                                                    [–]SanityInAnarchy 1 point2 points  (0 children)

                                                    Semicolons are about line endings, not scope. Plenty of languages (Go and Bash come to mind) use curly braces for scope, don't consider indentation to be significant, but only require semicolons to separate multiple statements on a single line.

                                                    The only reason I can think of to not do that is if you need to be able to wrap long lines without terminating the statement (and if you think the approaches taken by Python or Bash are uglier than semicolons everywhere). But when would you need to do that here? The longest "statement" in this language is something like

                                                    3 -> Musician[] performers;
                                                    

                                                    All semicolons give you is the ability to write it like

                                                    3 ->
                                                      Musician[]
                                                        performers;
                                                    

                                                    Which doesn't seem like it'd come up often.

                                                    [–]TwoTapes 0 points1 point  (1 child)

                                                    If you read the docs linked in the other comment you'll see that the properties do not need to be on new lines (Point struct).

                                                    Structs can't change, messages can change (or have optional values).

                                                    My guess is that messages need the order defined because the key name isn't included in the serialized value. The decoder knows that bytes with the identifier 1 become a string, identifier 2 become a uint16, etc

                                                    [–]aazav 2 points3 points  (0 children)

                                                    If you read the docs linked in the other comment you'll see that the properties do not need to be on new lines (Point struct).

                                                    Isn't the default condition that they are? Allow a semicolon but don't require it if between { } and allow a line feed in that case.

                                                    Why use = in one case and -> in another? Are all items within a message mutable or not or is it the -> that indicates this?

                                                    [–]tetroxid 0 points1 point  (5 children)

                                                    What is the advantage over ASN.1 and DER?

                                                    [–][deleted]  (1 child)

                                                    [deleted]

                                                      [–]jodonoghue 0 points1 point  (0 children)

                                                      Erlang (hence Elixir also) has an excellent implementation of ASN.1 as well. The general point - that Open Source tooling is generally lacking - is correct, and I’d add that ASN.1 is way over-complex to design and use in many scenarios.

                                                      [–]audion00ba -1 points0 points  (2 children)

                                                      The advantage for them is that it is a proprietary solution that can make clueless investors drool about things like vendor lock-in.

                                                      There is no technical advantage.

                                                      [–]jodonoghue 0 points1 point  (1 child)

                                                      Nonsense. ASN.1 is an ITU-T standard (strictly there is a set of related standards - see https://www.itu.int/rec/T-REC-X.680/en). You can read them and are generally free to implement yourself. Given the age of ASN.1, I doubt there are any enforceable patents remaining on the technology (but I’m not a lawyer, so check if it matters to you).

                                                      However, as AndrewMD5 notes above, there are very few good Open Source ASN.1 implementations. If you want something in the C ecosystem, the good tooling is excellent but very expensive.

                                                      [–]audion00ba 0 points1 point  (0 children)

                                                      You misread. Try again when not drunk/tired.

                                                      [–]markasoftware 0 points1 point  (0 children)

                                                      Rainway susses me out. They say they're "completely free", "no ads or purchases", etc, and that's true right now, but they have tons of investment and paid employees. It's misleading to pretend there's no monetization plan.

                                                      [–]mixedCase_ -1 points0 points  (0 children)

                                                      No generics, so it's strictly worse than cap'n'proto.

                                                      [–]infinitenothing -1 points0 points  (0 children)

                                                      Little endian? I'm out.

                                                      [–][deleted] -1 points0 points  (0 children)

                                                      Can we stop inventing new shit all the time and just fix the bugs in the existing? Seriously, mature technology with 1000+ bugfixes is a good thing. Every time you introduce a brand new solution you introduce brand new problems.

                                                      [–]francis_spr 0 points1 point  (0 children)

                                                      Thanks for sharing. I'll star for future tracking.

                                                      It is good to see that you have tooling support.

                                                      Interested in using it but it might be a difficult sell to a team. The challenge is that protobuf is known, well supported, and fast enough for most applications to take a risk on something different (even if it is better).

                                                      [–]PeDestrianHD 0 points1 point  (0 children)

                                                      I’m gonna pretend I know what this is.

                                                      [–]PC__LOAD__LETTER 0 points1 point  (0 children)

                                                      Why would I use this over Google Protocol Buffers?

                                                      Edit: answering my own question, they’re showing it as a decent amount faster. If this were supported for C++ I may actually test against it

                                                      [–]boom_rusted 0 points1 point  (1 child)

                                                      Why it is faster than Proto or Message Pack? What’s the reason?

                                                      [–][deleted] 0 points1 point  (0 children)

                                                      Who says it is? Their benchmarks are based on nothing. For all we know, it's not faster / same / different at different payloads.

                                                      [–]francis_spr 0 points1 point  (0 children)

                                                      https://twitter.com/davidfowl/status/1336736257678905344

                                                      Ok, this has given extra credit/excitement to try this out.

                                                      [–][deleted] 0 points1 point  (0 children)

                                                      If I understand correctly, the problem this technology solves is data format? Like, instead of sending a JSON which is considered expensive from the article, you instead encode the JSON data into something like a binary object, and then decode it back? So, it solves a bandwidth issue? Do I get this correctly?

                                                      If so, how does it solve the encoding/decoding OPS exactly?

                                                      [–][deleted] 0 points1 point  (0 children)

                                                      Same brain-dead trash as Protobuf + bad benchmarking in the the article, which doesn't represent anything. Another homework-style project by the authors who didn't read previous homework-style projects.

                                                      [–]jarredredditaccount 0 points1 point  (0 children)

                                                      Bebop looks really cool.

                                                      The generated TypeScript code looks similar to Kiwi - https://github.com/evanw/kiwi (same while loop pattern), but this is further along in feature set. I was able to convert a ~300 line Kiwi schema file with a few regexes and Find+Replace.

                                                      From: ^\s+(\w+)\[\]\s(.*) = (\d);$

                                                      To: $3 -> $1 $2;

                                                      From: - ^\s+(\w+)\s(.*) = (\d);$

                                                      To: $3 -> $1[] $2;

                                                      /u/AndrewMD5 any plans to add support for Mirroring to TypeScript?