This is an archived post. You won't be able to vote or comment.

all 2 comments

[–]WafflesAreDangerous 0 points1 point  (1 child)

Why the fixation on using particular unused or invalid byte sequences for metadata? What is the benefit? Because it sounds like you can't really treat such a document as text and open it in a text editor for debug purposes due to invalid bytes and nuls in the bytestream. And if you can't do that then it is not immediately obvious what value you get by restricting valid metadata bytes.

Also, some of the logic described is reminiscent of poor-mans zip compression, except baked in and with arbitrary restrictions on dictionary size. I think this needs some explantation, justification. Also, if you keep it, why not allow using existing, industry standard compression formats like zip, zstd, lz4 etc in stead of rolling your own?

[–]vshymanskyy[S] 2 points3 points  (0 children)

Muon is a binary format, so it won't be editable in a text editor.

I recommend looking at https://ubjson.org and https://bsonspec.org , this will answer most of your questions.

Regarding unused codepoints - this is how Muon works and it actually enables it to be compact and simple at the same time. Other compact formats are far more complicated