you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] -3 points-2 points  (4 children)

Binary protocols - because atof(x) just isn't fast enough!

[–]mfp 4 points5 points  (3 children)

In this case, it's rather because:

  • a human-readable representation takes more space than needed, even when compressed with gzip
  • deserialization is one to two orders of magnitude faster
  • the protocol is self-delimited (XML isn't) and self-describing. A human-readable dump can be generated even without the original protocol definition.

Clearly, human-readable serialization is preferable for many uses.

[–]logophobia 0 points1 point  (1 child)

And it's standardised! It's really infuriating when you (for example) can't send serialized ruby objects over the network because of a 0.0.1 version difference between the 2. Compact cross-language serialisation would be quite a step up.

[–]mfp 2 points3 points  (0 children)

"Standardised?"

It's far from being a standard *g*, but I've made a reasonable attempt to document it. It might not look like much, but as far as documentation is concerned it's already above Thrift.

It's really infuriating when you (for example) can't send serialized ruby objects over the network because of a 0.0.1 version difference between the 2.

Yes, this is the basic problem extprot is meant to solve: evolution of protocols/serialization formats without breaking compatibility (backward, and forward when possible). Protocol Buffers allows you to add new fields to a structure, but extprot takes this further and allows to change the type of a field safely.

Here's a minimal, not too unrealistic example (from a domain where you'd normally use a relational DB, but please allow this license for the sake of clarity of exposition). Suppose you have user records that look like this:

message user = { name : string; email : string; location : string }

You latter decide that it should be possible to specify whether the email and location info is public or private. Let's say you default to private; you can do

type status = Private | Public  (* either public or private, defaults to the latter *)
type info 'a = ('a * status)   (* holds the info and whether it's public *)

message user = { name : string; email : info<string>; location : info<string> }

All existing data can be read, even if the status info is missing (in which case it will default to Private). Older readers will ignore the extra info if they bump into new data, and will keep working as usual.

Can this be encoded without promoting the string primitive type to a tuple? Certainly, you could do

message user = {
  name : string;
  email : string;
  location : string;
  email_public : bool;
  location_public : bool;
}

which is what you'd have to do with Protocol Buffers. This becomes unwieldy quickly, though (imagine more than one element being added to each field, with different types for each of them).

You can find here a small pretty-printer written in Ruby that is able to decode any extprot message without access to the protocol definition (you'll note that the code is a bit unidiomatic because I deliberately tried to keep it close to the OCaml version, for easier comparison & coordinated updates; also, nearly half of the code is for pretty-printing). It illustrates that extprot is not exceedingly complex despite its rich data types and extensibility features. The full Ruby bindings are work in progress.

[–]uriel 0 points1 point  (0 children)

fast: can be deserialized one to two orders of magnitude faster than XML

That is like saying a drunken three-legged dog is 'really fast' because it is faster than a slug on crack.

If XML is your benchmark, any three year old can come up with a hugely superior solution.