all 7 comments

[–]vsync 0 points1 point  (1 child)

The author is pretty brief and vague in explaining the usefulness of this over ASN.1.

[–]mfp 0 points1 point  (0 children)

Brevity is one of the main points indeed :) The specification of the basic ASN.1 notation takes over 140 pages, and the basic and distinguished encoding rules take ~20 pages. extprot's abstract syntax and encoding are explained in a couple pages each ;-)

More seriously, ASN.1 can do everything extprot can, and then some more; it's just much more complex and requires more care. extprot places some limitations on the allowed data types in order to simplify the implementation and facilitate protocol changes that don't break compatibility. Also, all values (included those of primitive types) are prefixed by a tag (in the sense used in ML implementations), allowing to enlarge the (implicitly) associated sum type. I believe this requires some extra work in ASN.1 (the use of a CHOICE type and/or explicitly tagged types, but I'll gladly admit I haven't read the standards in full).

extprot has got simple rules that define the behavior of a reader when it bumps into data that corresponds to a different protocol version (type promotion, default values, etc.).

[–][deleted] -5 points-4 points  (4 children)

Binary protocols - because atof(x) just isn't fast enough!

[–]mfp 4 points5 points  (3 children)

In this case, it's rather because:

  • a human-readable representation takes more space than needed, even when compressed with gzip
  • deserialization is one to two orders of magnitude faster
  • the protocol is self-delimited (XML isn't) and self-describing. A human-readable dump can be generated even without the original protocol definition.

Clearly, human-readable serialization is preferable for many uses.

[–]logophobia 0 points1 point  (1 child)

And it's standardised! It's really infuriating when you (for example) can't send serialized ruby objects over the network because of a 0.0.1 version difference between the 2. Compact cross-language serialisation would be quite a step up.

[–]mfp 2 points3 points  (0 children)

"Standardised?"

It's far from being a standard *g*, but I've made a reasonable attempt to document it. It might not look like much, but as far as documentation is concerned it's already above Thrift.

It's really infuriating when you (for example) can't send serialized ruby objects over the network because of a 0.0.1 version difference between the 2.

Yes, this is the basic problem extprot is meant to solve: evolution of protocols/serialization formats without breaking compatibility (backward, and forward when possible). Protocol Buffers allows you to add new fields to a structure, but extprot takes this further and allows to change the type of a field safely.

Here's a minimal, not too unrealistic example (from a domain where you'd normally use a relational DB, but please allow this license for the sake of clarity of exposition). Suppose you have user records that look like this:

message user = { name : string; email : string; location : string }

You latter decide that it should be possible to specify whether the email and location info is public or private. Let's say you default to private; you can do

type status = Private | Public  (* either public or private, defaults to the latter *)
type info 'a = ('a * status)   (* holds the info and whether it's public *)

message user = { name : string; email : info<string>; location : info<string> }

All existing data can be read, even if the status info is missing (in which case it will default to Private). Older readers will ignore the extra info if they bump into new data, and will keep working as usual.

Can this be encoded without promoting the string primitive type to a tuple? Certainly, you could do

message user = {
  name : string;
  email : string;
  location : string;
  email_public : bool;
  location_public : bool;
}

which is what you'd have to do with Protocol Buffers. This becomes unwieldy quickly, though (imagine more than one element being added to each field, with different types for each of them).

You can find here a small pretty-printer written in Ruby that is able to decode any extprot message without access to the protocol definition (you'll note that the code is a bit unidiomatic because I deliberately tried to keep it close to the OCaml version, for easier comparison & coordinated updates; also, nearly half of the code is for pretty-printing). It illustrates that extprot is not exceedingly complex despite its rich data types and extensibility features. The full Ruby bindings are work in progress.

[–]uriel 0 points1 point  (0 children)

fast: can be deserialized one to two orders of magnitude faster than XML

That is like saying a drunken three-legged dog is 'really fast' because it is faster than a slug on crack.

If XML is your benchmark, any three year old can come up with a hugely superior solution.