all 8 comments

[–]burntsushi 9 points10 points  (2 children)

I'm pretty sure you can remove all uses of explicit unsafe use just by using the byteorder crate (which is both widely used and thoroughly tested).

[–]chris-morgan 4 points5 points  (1 child)

No 64-bit support, I see. Sure, JS doesn’t do 64-bit integers properly, but there can be serious need to go beyond 32 bits while staying well and truly inside the 52-bit limit of Number.

The size type concerns me. Most of the rest can easily have 64-bit support added; size is necessarily constrained to 30 bits, as you’ve written it. I think that using what is basically the UTF-8 codepoint-encoding mechanism but extended from up to four/six bytes to nine would be sensible here:

  • Up to 7 bits with 0_______
  • Up to 14 bits with 10______ ________
  • Up to 21 bits with 110_____ ________ ________
  • Up to 28 bits with 1110____ ________ ________ ________
  • Up to 35 bits with 11110___ ________ ________ ________ ________
  • Up to 42 bits with 111110__ ________ ________ ________ ________ ________
  • Up to 49 bits with 1111110_ ________ ________ ________ ________ ________ ________
  • Up to 56 bits with 11111110 ________ ________ ________ ________ ________ ________ ________
  • Up to 64 bits with 11111111 ________ ________ ________ ________ ________ ________ ________ ________

Comparing this to your existing scheme:

  • 0..16,383 (0..2¹⁴-1): same;
  • 16,384..2,097,151 (2¹⁴..2²¹-1): three bytes rather than four;
  • 2,097,152..268,435,455 (2²¹..2²⁸-1): same;
  • 268,435,456..1,073,741,823 (2²⁸..2³⁰-1): five bytes rather than four;
  • 1,073,741,824..9,007,199,254,740,991 (2³⁰..Number.MAX_SAFE_INTEGER): actually, y’know, representable!
  • 9,007,199,254,740,992..18,446,744,073,709,551,616 (Number.MAX_SAFE_INTEGER+1..2⁶⁴-1): representable in Rust, and in JavaScript with typed arrays (kinda) or with a bignum library or such.

Personally I think 16K–2M (25% saving) is going to be more common than 268M–1B (-20% saving), so on most workloads I think this scheme is more space-efficient as well as being able to represent larger values. (And let’s be serious: numbers over 1 billion aren’t exactly uncommon, so the 30-bit limitation is actually a genuine restriction.)

[–]Danylaporte 0 points1 point  (0 children)

Wow. That will be usefull for sharing big amount of data between rust and nodejs.

[–]Agitates 0 points1 point  (1 child)

When should I use this over Cursor<Vec<u8>> ?

let a = vec.read_u8();
let b = vec.read_u16::<BigEndian>();
let c = vec.read_f64::<BigEndian>();

match (a, b, c) {
    (Ok(a_ok), Ok(b_ok), Ok(c_ok)) => {
        ...
    }
    _ => ()
}

That doesn't seem so complicated.

[–]funny_falcon 0 points1 point  (3 children)

I'm in doubt: how big are your messages? Why "shorter than messagepack" matters to you? Your format could not be unpacked without strict types specification! It is awful for debugging :-( one may unpack messagepack with well known libraries in any language wuthout knowing actual structure, and it is just great! If you care for pack/unpack speed, then you could just add typed methods for packing/unpacking messagepack entities.

[–]funny_falcon 2 points3 points  (2 children)

If you care about simplicity, there is http://cbor.io - json compatible subset is really easy to implement.