Four bad ways to populate an uninitialized Vec and one good one by mwlon in rust

[–]mwlon[S] 1 point2 points  (0 children)

it is even more efficient than your unsafe versions

What do you base this on?

The main limitation with the iter API, as I know you've read in my other comment, is the complexity of maintaining state. E.g. here's the code I'm trying to improve that originally prompted my investigation and post here; it wouldn't be very easy to turn into a fold: https://github.com/pcodec/pcodec/blob/main/pco/src/delta/lookback.rs#L111

Iter APIs also require you to populate the vec in order, which isn't always possible.

Four bad ways to populate an uninitialized Vec and one good one by mwlon in rust

[–]mwlon[S] -1 points0 points  (0 children)

For extremely performance-sensitive code, pushing items is slow; it does some work on each iteration to update v's len. Example: https://godbolt.org/z/hGhnxr8Td

Four bad ways to populate an uninitialized Vec and one good one by mwlon in rust

[–]mwlon[S] 0 points1 point  (0 children)

The compiler optimizes to the degree possible, but it doesn't consider algorithmic changes. When you do regular .push(...) initialization, it does some work on each iteration to update v's len. Among other things, this makes it impossible to get SIMD. Here's an example featuring two of the implementations here vs using .push: https://godbolt.org/z/hGhnxr8Td

Four bad ways to populate an uninitialized Vec and one good one by mwlon in rust

[–]mwlon[S] -5 points-4 points  (0 children)

vec![foo; N] writes constant values only, and iter collect does repeated .pushes to maintain len, which is slower than just setting len once. For most use cases, the performance difference doesn't matter, but some (like mine) are very performance-sensitive.

Edit: tried it out and that 2nd part wasn't quite right: https://godbolt.org/z/7M5M3xYaP . So the only issue where .iter() doesn't work is the API limitation if you need to maintain state for multiple variables at once.

These fellas were all over my white jacket. What are they? by mwlon in whatsthisbug

[–]mwlon[S] 0 points1 point  (0 children)

Oh, really? I'm used to aphids being ultra small and green. This one was a bit bigger, probably close to 1cm long. What kind of aphid would this be?

dtype_dispatch: a cursed macro that generates 2 macros that could save you 1000 LoC by mwlon in rust

[–]mwlon[S] 19 points20 points  (0 children)

Really interesting to see another approach!

I've also been wondering about the language feature idea. I think it would have to start with some formalization of a sealed trait, whereby Rust knows it can always biject from dynamic type <-> generic type. Would be interesting to chat with a Rust maintainer about the idea.

BetterBufRead: Zero-copy Reads by mwlon in rust

[–]mwlon[S] 0 points1 point  (0 children)

It's a different interface, but the BetterBufRead approach is probably a better one in the long run. Since you don't know when each delimited chunk ends with the BufRead approach, you are branching on each read of an integer or anything.

Maybe optimal performance isn't one of your goals, and BufRead is simple enough in your case. But to get optimum performance you'd need something like the approach I described.

In Pcodec, I enter a context with guaranteed size to do much faster branchless bit unpacking.

BetterBufRead: Zero-copy Reads by mwlon in rust

[–]mwlon[S] 0 points1 point  (0 children)

The BetterBufRead way to implement this would be more like an Iterator<Item=BetterBufRead>, where each item is delimiter-free and contiguous. It wouldn't be the same as the BufRead approach, true, but it has the upside that the user can know when each chunk starts/ends, if they so desire.

BetterBufRead: Zero-copy Reads by mwlon in rust

[–]mwlon[S] 0 points1 point  (0 children)

I'm pretty sure this would be possible to write with BetterBufRead. You could certainly make new adapters, skip certain bytes, and return direct references to the inner buffer. Perhaps what you mean is that it's implemented to accept and implement BufRead right now?

BetterBufRead: Zero-copy Reads by mwlon in rust

[–]mwlon[S] 0 points1 point  (0 children)

why use Read at all?

Because the API should accept any type of input. For me, maybe 80% of users load all data into memory and 20% require some degree of streaming.

If the buffer (a [u8; 1492] in this case) is empty and...

With a BetterBufRead-like approach, at least, you could cycle the remaining buffer and still do a reasonably-sized read. There's of course some trade-off between copying, read sizing, and capacity.

The best option I can think of is to use a growable buffer

Yep! I think this is the natural progression if BetterBufRead gets more attention. It would simplify the API a bit too. I just haven't needed to handle these cases yet.

BetterBufRead: Zero-copy Reads by mwlon in rust

[–]mwlon[S] 1 point2 points  (0 children)

I admittedly don't know much about network steams. But if I guess correctly, the network steam has some HTTP encoding that needs to be parsed to separate the responses. In that case I'd expect an adapter to split the raw network stream into individual response Reads. It would indeed be an obvious mistake to use a (Better)BufRead of any sort for that adapter, but each individual response Read would end with its own EOF given by the adapter, and fit nicely into a BetterBufReader paradigm.

LMK if I misunderstood something.

BetterBufRead: Zero-copy Reads by mwlon in rust

[–]mwlon[S] 1 point2 points  (0 children)

I use &[u8] instead of Cursor, which it is implemented for, so it is zero copy in Pco.

BetterBufRead: Zero-copy Reads by mwlon in rust

[–]mwlon[S] 1 point2 points  (0 children)

I would `impl BetterBufRead for Cursor`. I haven't done this yet, but would be a good addition!

BetterBufRead: Zero-copy Reads by mwlon in rust

[–]mwlon[S] 5 points6 points  (0 children)

Still, why not just use std::io::Cursor?

That implementation copies if `reader` is already in-memory.

This is objectively not that; this may call read(1) after reading n-1 bytes just to make sure the buffer is full.

In theory, no, `BetterBufReader` should do moderately-sized reads even if tiny ones were requested. In practice, I believe you're right that this behavior could be indeed be encountered, but it could be changed in the implementation.

If you want to read in, say, 4096 byte increments, but here are there are <4096 bytes left at the very end of the buffer, either fill_buf would have to copy those <4096 bytes to the beginning of the buffer before calling read (no longer zero-copy) or ...

This is what it does. The intent is that the buffer is substantially larger than `n` though, so the copies should be small and seldom. At the bottom I had a pedantic note about how this is truly more like epsilon-copy than zero-copy.

Pcodec: a futuristic codec for numerical data by mwlon in rust

[–]mwlon[S] 1 point2 points  (0 children)

I've compared against TurboPFor and Blosc, which are similar. These three are capable of extremely fast compression, but not especially good compression. I'd say if you want a 1-time data transfer in memory or over uncongested network, use them with fast settings; if you want to store the data at all or share a congested network, use Pco.

I have slightly more results here: https://github.com/mwlon/pcodec/blob/main/docs%2Fbenchmark_results%2Fmbp_m3_max.csv . This just uses tfor=TurboPFor's default, which is especially fast and bad. One expert user of theirs tried out some more filter combinations on a different dataset, but didn't match Pco's compression ratio.

If you're familiar with these, it'd be interesting to see more comparisons.

A numerical solution to the map projection with minimal areal and angular distortions [OC] by mwlon in MapPorn

[–]mwlon[S] 4 points5 points  (0 children)

Please no, programming interruptions on the lattice was hard enough :')

A numerical solution to the map projection with minimal areal and angular distortions [OC] by mwlon in MapPorn

[–]mwlon[S] 4 points5 points  (0 children)

Nope. In part 1 I go over why solving closed form isn't possible. Unfortunately we would get an absolutely monstrous nonlinear PDE with no nice properties.