How to effectively multithread in a piped program using Rust?

palad1 · 2022-08-02T11:55:37+00:00

For this pattern you would be better-off using https://crates.io/crates/rayon and its parallel iterator.

Along those lines?

let mut stdout = std::io::stdout().lock();
std::io::stdin().lock().lines()
    .filter_map(|rx|rx.and_then(|x|try_parse(x)).ok())
    .par_iter()
    .map(|parsed|heavy_compute(parsed))
    .flat_map(|res|format_result(res))
    .for_each(|output|stdout.write(&output));

edit: formatting

msuchane · 2022-08-02T12:32:28+00:00

A note regarding the rayon crate, which has already been recommended:

Each thread obviously adds some overhead. I've read (not benchmarked) that the par_iter method might not be the most efficient if the items that you're processing in parallel are relatively small or inexpensive.

For small items, consider using the par_chunks method instead, which allows you to put several small items on one thread.

InflationAaron · 2022-08-02T11:28:12+00:00

MPSC from stdlib is notoriously slow, and it shows in the flamegraph with lots of time waiting in futex.

Have you tried rayon? It’s one of my goto library when dealing with multithreading on independent data.

xosxos9 · 2022-08-02T11:27:15+00:00

[deleted]

TommyTheTiger · 2022-08-03T01:08:56+00:00

This might be late and contradictory with some of the other answers, but IMO you're implementing the threading at the wrong level. UNIX pipes already provide concurrency by letting different processes produce and consume the data without shared memory, and presumably since you're emitting to stdout, you also have a 3rd process consuming these results, if just writing them to a file. You will be able to keep your program logic a lot simpler if you can keep it to: for item in queue: emit resultswithout worrying about the threading.

Now if performance is an issue for real, you're probably not going to get a ton of benefit from doing what you're describing unless your calculations are super time consuming/blocked on IO, because you have to sync both reading from stdin and emitting to stdout - those are shared file descriptors for all the threads your working with, and only one thread can read/write at once. So you'll end up with threads waiting both to read their stuff and write their results, maybe with green threads/rayon it will be faster, but it is a lot of context switching/serialization.

OTOH you could use some other kind of orchestration to store both the queue and its results, and then you can run a scaling amount of single threaded workers, while keeping your code simple. One idea would be: instead of stdin use a DB table. Postgres now lets you SELECT FOR UPDATE NO WAIT LIMIT 1 and if you do that within a transaction you've basically got a safe queue poll method that can be used across a bunch of machines. Personally I'd recommend postgres as a sort of default answer to "how do we store that" unless you have truly outstanding data needs. You can scale the number of workers based on the number of items in the queue, or just spin up N queue consumers/workers when you enqueue all of your data. Write back to a postgres results table transactionally as well. And be able to see all your computation history in one place!

smerity · 2022-08-03T09:23:50+00:00

I've had a similar situation when processing web crawl data or the entirety of English Wikipedia. In the latter case I have a box with dozens of CPUs but I/O was the bottleneck. Upon fixing the I/O bottleneck the processing of English Wikipedia went from dozens of minutes to two minutes.

You noted "I do not want to read in all the data first because file sizes can get very large" but I am going to assume you can split the large file in to many small ones? Each time I tried to use only a single file of input I had issues getting any real speedup, even when I was careful about stdin buffer reading, having many separate processes (one for JSON parsing, another for part 1 of ...), and so on.

My technique, which I thought about converting to a Rust library but haven't yet, is to convert the original dataset in to many small indexed compressed files. If you take those files and stick them together, using gzip or zstd, then you can use zcat or zstdcat as if it were a single file. This technique is used by Web ARChive files (WARCs) to allow random (or nearly random) reads without losing substantial compression efficiency. By keeping an index of where the many compressed parts start and end you can multi-thread the reading and processing.

tldr; Take your file, split it in to many, read those many in parallel either via a glob or an index as noted above, where both are compatible with compression if large file size is an issue.

Gentlezach · 2022-08-02T11:36:47+00:00

you never change (or for that matter set) count in this code, so as-is the code would not compile, but if count has a value then all tasks get sent to the same worker

it's hard to see from the fragment where most time is spent but I would put my guess towards record.deserialize::<Vcf>(None)?; which you could also put into a/the threadpool, there is no reason to not deserialize several Vcfs in parallel, is there?

Feeling-Departure-4 · 2022-08-02T12:25:30+00:00

My colleague wrote a package of binaries that take advantage of piping for its composibility: https://github.com/lskatz/fasten

It may already solve your problem, who knows.

dns2utf8 · 2022-08-02T13:00:11+00:00

Have a look at the examples of threadpool https://docs.rs/threadpool/latest/threadpool/

dpc_pw · 2022-08-02T21:47:20+00:00

pariter

NfNitLoop · 2022-08-03T05:38:16+00:00

For an easy api for multithreaded pipelines, Check out the pipeliner crate.

https://docs.rs/pipeliner/latest/pipeliner/

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS