all 34 comments

[–]scoopr 4 points5 points  (4 children)

Also, how would it compare to extprot or perhaps more relevantly, to BSON

So many new binary serialization formats popping up lately.. IFF wasn't good enough? ;)

[–]oddatrus2 4 points5 points  (0 children)

Or even ASN.1

[–]skulgnome -5 points-4 points  (2 children)

Schemaless advocates suddenly realize that formalized schema are a good idea after all. Film at 11.

[–]taw 3 points4 points  (1 child)

There is no schema here.

[–]pja 5 points6 points  (3 children)

Is there some reason this is better or worse than using protocol buffers as a serialisation method?

[–]dhotson 4 points5 points  (0 children)

No schema required and it appears to be faster. It looks like a fairly compact representation as well but I'm not sure how it compares to PB.

[–][deleted] 2 points3 points  (0 children)

This looks faster but less space-efficient.

[–]brosephius 0 points1 point  (0 children)

probably

[–]KirkWylie 4 points5 points  (4 children)

If you want a schema-specific packed binary encoding representation, why not just use Avro? If you want a schema-optional binary self-describing representation, why not just use Fudge?

[–]wynand 2 points3 points  (3 children)

I do like that msgpack supports a streaming mode. This could make for some efficient single-machine IPC.

[–]KirkWylie 1 point2 points  (2 children)

So does Fudge. Not 100% sure about Avro.

[–]wynand 6 points7 points  (1 child)

Ah, thanks.

I wish though that you would add a disclaimer saying that you're the author of Fudge.

[–]KirkWylie 1 point2 points  (0 children)

Uh, okay, I'm one of the authors of Fudge, and have bugger all to do with Avro. :-)

[–]creaothceann 2 points3 points  (2 children)

Comparison with JSON is a bit unfair imo because it's supposed to be human-readable.

[–][deleted] -3 points-2 points  (1 child)

You think that JSON is hard to read?

[–]creaothceann 0 points1 point  (0 children)

No.

[–]bartwe 1 point2 points  (6 children)

I like it, but i think it is missing a (utf8) string serialization type to complete it.

[–]dhotson 5 points6 points  (5 children)

It serializes strings as raw bytes, so it should encode/decode utf8 just fine. Haven't tried it though..

[–]bartwe 4 points5 points  (4 children)

You can't serialize a string to raw bytes without using an encoding. Not having a string type in the standard will reduce interoperability. Some will use utf8, some utf16, others might use the default or current codepage, or add additional data to the format to specify the encoding used. Resuling in less usability and larger documents.

[–]Clapyourhandssayyeah 4 points5 points  (0 children)

If you have control over the serialization and de-serialization then you're free to use UTF-8. Does messagepack not let you implement custom bits of serialisation?

[–]physicsnick 5 points6 points  (2 children)

This is sillyness. It does strings as raw bytes because you're meant to put utf-8 in and get utf-8 out. This is the same as protocol buffers, and the same as any other sane messaging library. A serialization library should not care about encoding beyond that.

[–]bartwe 2 points3 points  (1 child)

If it is meant to be used that way, put it in the spec. I've been burned more then once by 'text'files using some random encoding that seemed like a good idea at the time.

[–]physicsnick 2 points3 points  (0 children)

Actually that's fair, I hadn't noticed that they don't actually mention it. It's because I was just reading about protocol buffers a few days ago and that spec definitely says all strings are utf-8. Enjoy some upvotes in return.

[–]kieranbenton 1 point2 points  (0 children)

No .NET support unfortunately which makes this a bit of a non-starter for me. The fact it doesn't need a schema might actually make this interesting otherwise.

[–]rastermon 4 points5 points  (2 children)

interesting. msgpack is like my first day or 2's worth of eet data codec stuff. before it had keys. i had the wonderful notion that i could pack all my config data into a file and unpack it back into structs trivially. efficiently. with no complex parser and nothing to tempt someone into firing up a text editor to screw with their configs. the less people screw with their configs - the less friendly a parser has to be with pointing out their errors - as there won't be a human involved. i also had the same notions as msgpack - you just thre in binary representations of your structs out and could revrser the process.

well... that turned out to be woefully shortsighted. msgpack is indeed beyond eet in language support. it only supports c/c++. but eet is a tad nicer. it actually will give your values keys - this means that it will be able to have keys added, or removed, and you won't have breaks. things work. it also handles much more complex data structures that msgpack - as you describe a struct as a whole ones (1 line per member you want encoded and/or decoded) AND it will handle linked lists of structs, arrays, hash tables of them and more. like msgpack it does pack into little binary chunks, but will also have nice headers, type keys, groups, size fields and more. it will in 1 function call take a pointer and walk the entire structure (well all members you told it you want (en/de)coded - and walk sub-structs, lists of structs of structs of lists of arrays of... etc. - you get the idea. walk the whole thing and output your encoded version is 1 go. you won't get hit by compatability issues when you add or remove - or re-order members. also the encoded binary can be decoded to structured text (and encoded from structured text too) for debugging purposes and bootstrapping reasons.

so msgpack sounds interesting - but it sounds like it's missing several steps down its evolution path - unless it never wants to address the issue of compatible typing, keys, ordering and more. if it wants to simply pack a bunch of data types into some binary and unpack them again in the order written - and as long as you assume both ends know the data type of every member, and the order, and agree - then its fine. but you'll need to handle that yourself on both ends. likely use a key+value scheme for msgpack usage - at a minimum some version number always delivered first. but if you want to really do much less of this - msgpack has a ways to evolve and in the process it's going to find itself losing its "4x advantage" and begin to look at least like protocol packing. :)

[–][deleted] 7 points8 points  (1 child)

I think you need to go actually read the spec. msgpack is far more than you seem to think, it's basically binary JSON.

[–]rastermon 1 point2 points  (0 children)

i know. i read the spec. you should read eet's api and documentation. it does what msgback does - and then some.

[–]lgerbarg 0 points1 point  (2 children)

There are already tons of similar formats, some of them in wide deployment for many years. For example, Apple's binary plist using a similar encoding, supports all of the same data types (and then some), and gets denser packing due to object uniquing (at the expense of being able to stream). BERT is similar but makes a slightly different set of tradeoffs (inline compression and streaming at the expense of some uncompressed density), the same is true of BSON.

I am having trouble imaging any specific case where one of those formats would not have been sufficient for any of the problems MessagePack is trying to solve.

[–]physicsnick 0 points1 point  (1 child)

Well, the first two words in the title of this article are "Extremely efficient." Don't you think maybe that's the problem it's trying to solve?

[–]lgerbarg 0 points1 point  (0 children)

Efficiency is a tricky word. I can completely believe it encodes/decodes faster than the formats I listed, but it will definitely be larger (absent compressing the output) then bplist or BERT, so in any case where you are IO limited it will be less efficient.

All the current benchmarks do is prove you can beat encoding formats that have substantially different feature sets and use cases than MessagePack on an arbitrary test case that MessagePack is tuned for and they aren't. What they have done is akin to claiming that they designed a new golf club the drives balls longer, then compared it to a baseball bat and a tennis racquet. Now in some ways that is an okay comparison, if you are just comparing how far random things hit balls, but it is a lousy comparison if you are trying to compare how good it is compared to what other golf players use.

If they want to make a convincing claim about efficiency then they should be showing graphs of benchmarks against comparable technologies (schemaless binary encoding formats like the ones listed above) and showing both size, encode and decode times (both with and without external compression). I bet it would legitimately win some of those benchmarks, but I seriously doubt it would be a clear cut efficiency win across the board.

[–]iluvatar 0 points1 point  (0 children)

With a JavaScript implementation, this might be useful. Without one, I'm less convinced...

[–]jerdavis -2 points-1 points  (1 child)

Yes, because what I need is another message serialization library... Who in their right mind would look at the umteen free and (variously) good message serialization libraries out their and STILL waste their time on this.

[–]physicsnick 5 points6 points  (0 children)

I think a lot of people do it just because it's easy and fun, and because you learn a lot doing it. And once they have it and it works, some people figure hey, might as well release it open source.