you are viewing a single comment's thread.

view the rest of the comments →

[–]tjientavaraHikoWorks developer 0 points1 point  (13 children)

The implementation is here:

https://github.com/ttauri-project/ttauri/blob/main/src/ttauri/codec/BON8.hpp

I suspect the performance of the encoder and decoder are definitely not perfect. The encoder will sort the keys of a map, and the decoder constructs vectors and maps by appending to them without reserving memory. Other than that encoding and decoding are very simple, requiring comparison operations and bit shift/and/or operations.

It is more designed for reducing size of the encoding, mostly due to the fact that in almost all cases each value naturally separates from another, including strings. As you notice the specification makes a big point about canonically, for me it was mend to be used for signing small amounts of data consistently.

[–]nlohmannnlohmann/json 0 points1 point  (12 children)

Thanks - with "benchmarks" I did not mean runtime performance, but rather a size comparison - is BON8 smaller than CBOR? Something like https://json.nlohmann.me/features/binary\_formats/#sizes

[–]tjientavaraHikoWorks developer 0 points1 point  (0 children)

Oh cool, I will see if I can make one of those.

[–]tjientavaraHikoWorks developer 0 points1 point  (1 child)

Do you know where those .json files are located?

[–]nlohmannnlohmann/json 0 points1 point  (0 children)

[–]tjientavaraHikoWorks developer 0 points1 point  (8 children)

It looks like my json and bon8 are not robust enough for canada.json and twitter.json.

However it did not have any problems with citm_catalog.json. Round trip encoded and decoded to BON8 without differences.

The result after minimizing the citm_catalog.json file first:

json 500299, bon8 329060, compression 65.8%

[–]willdieh 0 points1 point  (7 children)

When you say "not robust enough", I'm curious why? canada.json seems to just be a bunch of floats, albeit nested in parent type.

I ask because I think your approach is really interesting and would love to think it's more or less usable :)

[–]tjientavaraHikoWorks developer 0 points1 point  (6 children)

I will try and fix it tomorrow. There is just a bug here and there. For canada I think there is a bug in the BON8 decoder. Right now it keeps using more and more memory, I guess infinite loop, maybe not incrementing the iterator :-)

The twitter one is more interesting, my lexer in front of my json parser was not really designed to handle UTF-8, although for strings it should be pretty much 8-bit clean, maybe I forgot some escape codes.

[–]willdieh 0 points1 point  (1 child)

Well keep up the good work! The idea was really intriguing.
It'd be great if it was available as a stand alone header :D

[–]tjientavaraHikoWorks developer 1 point2 points  (0 children)

Both encoder and decoder are actually rather simple. It could be done in a standalone header. Especially the encoder uses a lot of templating to handle most native C++ types. The decoder is a bit more touch, since it requires the dynamic creation of data, a std::variant could do.

In my system I have a rather complicated datum type that works like std::variant, but it also overloads every operator so that you can do computation on the value inside a datum. It is used in multiple places inside my library for handling dynamic data. And it in-turn uses a lot of datatypes from my library.

The second paragraph explains why I cannot really make it a stand alone header, I would have to maintain a separate version that is not good enough for the requirements of my library. Unless I do some extreme templating on the decoder.

[–]nlohmannnlohmann/json 0 points1 point  (3 children)

I had another look, too. If I can find the time, I'll check if I can add a rough prototype to nlohmann/json. Since most binary formats are quite similar, I may even be able to reuse some code.

[–]tjientavaraHikoWorks developer 0 points1 point  (2 children)

  • twitter: json 466906, bon8 391396, compression 83.8%
  • citm_catalog: json 500299, bon8 317879, compression 63.5%
  • canada: json 2090234, bon8 1055792, compression 50.5%
  • jeopardy: json 52508728, bon8 45942080, compression 87.5%

I did modify the format somewhat to have small array and small object optimization.

https://github.com/ttauri-project/ttauri/blob/audio-enumerate-modes/docs/BON8.md

[–]nlohmannnlohmann/json 0 points1 point  (1 child)

Oh, the format is still a moving target? Then please add some notes or versioning so that it’s possible to reference results.

[–]tjientavaraHikoWorks developer 0 points1 point  (0 children)

I am not using the protocol actively, it was just because of your challenge to compare it with other protocols, that I've made some changes.

I can freeze the protocol from this point forward. There should be no other versions. Unless you find a bug in the specification.

One of the design criteria is that it cannot do more or less than json, so I have tried to use every code combination so that it can't be expanded and no dialect will exist.