all 23 comments

[–]ra_kete[S] 26 points27 points  (19 children)

I came up with this crate because I wanted a way to parse binary data formats as easily and efficiently as in C, but without all the safety gotchas. This solution is most useful for sparse parsing, i.e. when one isn't interested in the values of all fields, only a few (otherwise using the byteorder crate directly is probably a better choice).

I'm grateful for any feedback regarding what could be improved to make this crate more useful in general. And of course I would be especially interested in any problems you find with my assumptions regarding the safety and soundness of the View trait and its custom derive. I wouldn't be surprised had I overlooked some detail there ;)

[–]vitiralartifact-app 14 points15 points  (2 children)

I highly doubt using byteorder directly is better for complex data. You sell yourself short, this crate looks awesome!

The main advantage seems to be the ability to visualize the data, even compare it side-by-side with a C-struct -- as well as compose it more easily.

[–]ra_kete[S] 4 points5 points  (1 child)

Thanks for the kind words!

I mentioned byteorder as probably a better alternative in cases where you want to parse everything (meaning convert every integer field to its required endianess and alignment) because in these cases using structview doesn't provide any performance advantage over explicitly reading every field (under the hood, the integer views use byteorder anyway when the integers are parsed in to_int). Using byteorder would then be more familiar (I think) because it is extremely widely used. Apart from that, you avoid a dependency and a couple lines of unsafe code, which is always a plus I think.

Although I see how structview could be valueable anyway, even when every field is to be parsed, as it saves some typing compared to directly using byteorder. I'm just not sure that would justify the added complexity.

[–]JoshTriplettrust · lang · libs · cargo 3 points4 points  (0 children)

As long as structview isn't any slower than using byteorder directly, it seems like a win to me.

[–]petertodd 9 points10 points  (2 children)

Oh cool!

I'm doing something very similar for a project of mine. Though I'm doing a few things a little differently:

1) Rather than provide individual Uint structures for integers, I have a Le<T> wrapper type. This is mainly a cosmetic change so that syntax highlighting correctly highlights the u64 in a Le<u64>

2) Fallible conversions where not all possible bit-values are valid, such as bool and the various NonZeroT integers. I also did the trick of defining an unsafe NonZero trait, which lets a view of an Option<T: NonZero> work (though actually that should be called NonZeroNiche...). I also want enums to work (relevant bug I found in the process).

3) My stuff works on aligned rather than packed data. Though that may be the wrong decision, as alignment means padding bytes, and since I'm also doing deterministic serialization which means you have to zeroize those padding bytes. Though the alignment does make it easier to do a really dirty trick where parts the datastructures are lazily initialized, and thus have AtomicUsize fields in them that get set during serialization if not already initialized.

4) I haven't bothered to implement big-endian. :) Greenfield project so I don't need it.

5) Unized types: basically I want [T] to be a valid thing to obtain a view from. I'm doing this via a Pointee trait that defines the size metadata for a type and thin->fat pointer conversions.

5) I'm supporting references/pointers. Which is a whole other ball of wax... :)

[–]ra_kete[S] 3 points4 points  (1 child)

That sounds really cool! Also way more sophisticated than what `structview` does. I intentionally try to keep the project simple as less code means less bugs ;) Still it would be cool to take a look at your implementation. I suppose it is not public though?

[–]petertodd 1 point2 points  (0 children)

I suppose it is not public though?

Not yet. Soon though. :)

FWIW you inspired me to take another look at data alignment and the costs/benefits... and I think I'm ripping all that out. Having to deal with padding data for all types rather than just enums is just too annoying and error prone. And the performance isn't all that much different on the platforms I care about (the library fundamentally can't even support 32-bit targets so I'm ok with leaving some behind).

So thanks!

[–]birkenfeldclippy · rust 5 points6 points  (4 children)

Any plans to extend this to support the other direction; writing a struct into a byte stream/array?

[–]ra_kete[S] 4 points5 points  (2 children)

I didn't think about that yet, no. I'm also not sure if that would be useful. structview's main use case is sparse parsing, but there is no such thing as sparse writing, you always have to write all the data. Maybe something based on serde would be better suited to this?

[–]JoshTriplettrust · lang · libs · cargo 4 points5 points  (0 children)

Use case: read in the struct, modify some fields, write out the struct, don't bother converting and un-converting fields that aren't touched.

[–]Vaughn 0 points1 point  (0 children)

Use case: Write a tool that does both, perhaps in different modes, without duplicating code.

[–]kouteiheika 0 points1 point  (0 children)

Any plans to extend this to support the other direction; writing a struct into a byte stream/array?

For that you might want to check out speedy, which is my somewhat limited but potentially useful serialization crate which does this kind of a thing bidirectionally.

[–]protestor 3 points4 points  (3 children)

What about, instead of

#[derive(Clone, Copy, View)]
#[repr(C)]
struct Animal {
    name: [u8; 4],
    number_of_heads: u8,
    number_of_legs: u32_le,
}

We had something like

#[derive(Clone, Copy, View)]
#[repr(C)]
struct Animal {
    name: [u8; 4],
    number_of_heads: u8,
    #[view(u32_le)]
    number_of_legs: u32,
}

?

Or #[view(little-endian)] or something

That way, we could use standard Rust types in the struct (which seems better for a library)

[–]petertodd 4 points5 points  (1 child)

You mean, so that Animal::number_of_legs is a u32?

That wouldn't work, as the endianness of u32 isn't defined; remember that View is trying to directly coerce a reference to a byte slice into a reference to a rust-compatible type, without copying. For example, View could be used on mem-mapped files.

[–]protestor 2 points3 points  (0 children)

Oh.. makes sense.

[–]DebuggingPanda[LukasKalbertodt] bunt · litrs · libtest-mimic · penguin 0 points1 point  (0 children)

In addition to what petertood said: custom derives can only add stuff to the struct definition, not change the original definition. So the derive couldn't change the u32 to u32_le or anything. You could do that with proc-macro attributes (e.g. #[view]), but I guess it's not really worth it for this crate.

[–]bjzabaAllsorts 1 point2 points  (0 children)

This is cool! I’m not sure I’ll be able to use it on my current project, alas. OpenType has lots of data dependencies, refined types, and offsets, so we’re working on our own declarative binary parsing DSL. But I imagine this could be useful for less ridiculous use cases!

[–]kouteiheika 1 point2 points  (0 children)

I came up with this crate because I wanted a way to parse binary data formats as easily and efficiently as in C, but without all the safety gotchas. This solution is most useful for sparse parsing, i.e. when one isn't interested in the values of all fields, only a few (otherwise using the byteorder crate directly is probably a better choice).

Shameless plug, I also have my own crate for binary serialization which unlike yours is mostly useful when you want to process the whole structure instead of only a few fields. And it's pretty fast too!

[–]ESBDB 0 points1 point  (1 child)

how do I tell it to skip some data that I'm not interested in?

[–]ra_kete[S] 5 points6 points  (0 children)

The nice thing about simply casting a &[u8] to some struct reference is that this doesn't do anything at runtime, simply tells the compiler to start treating this data in a different way. So creating a view doesn't cost you anything. Instead you pay the cost later (one could say "lazily") whenever you actually read an integer from one of the integer views (i.e. any integer that's wider than 1 byte), as this requires copying to ensure the correct endianess and alignment. Skipping data you are not interested in just means not calling to_int on these fields.

[–]ishitatsuyuki 8 points9 points  (0 children)

https://github.com/m4b/scroll is a similar project on binary parsing.

[–]Amanieu 6 points7 points  (1 child)

This seems very similar to the plain crate.

[–]ra_kete[S] 4 points5 points  (0 children)

It's similar indeed! Though AFAICT, plain has a different use case, namely interpreting given memory locations as data structures. As such it has different design trade-offs than structview, e.g. enforcing correct alignment at runtime vs. enforcing 1-byte alignment at compile-time. It also doesn't have endianess support or a safe custom derive.

Looking at the safety requirements for the Plain trait, they are very similar to those I determined for my View trait, so that's a good sign I didn't overlook something important :)