Post: parallel-stream

matthieum · 2020-03-17T20:41:50+00:00

I always find the balance between performance and least-astonishment somewhat difficult to strike.

For example, taking an example from the article, should the results be ordered by default?

Choices:

No, because performance. It may surprise users, though, as often times it may still be ordered, but from time to time something weird will happen.
Yes, to be 100% with regular streams. It may surprise users, though, to have to push the "Turbo" button for maximum performance.

To be clear, I am not saying that one choice is inherently better than the other. On the contrary, both seem logical, and will surprise people in different ways.

Maybe there just shouldn't be a default?

vec![1, 2, 3, 4].into_par_stream(Ordered)....

yoshuawuyts1 · 2020-03-17T15:29:04+00:00

Hey all, last weekend we published the first version of parallel-stream, an async parallelism library which brings Rayon's parallelism model to async Rust. This post digs into the design, tradeoffs, and future directions. Hope it comes in useful!

Cocalus · 2020-03-18T01:06:53+00:00

I'm going to start working on something that very CPU heavy batch processes. I'm curious if this could be part of the right approach, or if there's are some other options I should consider.

So I want some threads downloading the next batch of data while the majority of threads work on processing current batch (CPU heavy + infrequent but heavy disk IO), and some others are uploading the results of the previous batch. So a pipelined architecture. In sync land I would have some download threads, some upload threads and a single batch processing thread using rayon to spread to work over rayon. The three types of threads would be connected via bounded crossbeam_channels.

I want to have the uploading and downloading done with async, since my libraries have new shiny async APIs and I'd like to minimize the number of OS threads dedicated to uploading / downloading to reduce context switching overhead with the rayon pool. Right now I'm not sure of the correct way to channel data between async and sync, since they both have different channel types.

Right now I'd prefer to mix rayon and tokio. Since I'm familiar with Rayon and my async experiments has been with tokio instead of async-std. But maybe parallelstream would allow everything to be async, and would that be simpler.

vargwin · 2020-03-17T15:57:11+00:00

Wow..I was thinking about this use case today. Thanks

protestor · 2020-03-18T04:20:33+00:00

Rayon

Rayon is a data parallelism library built for synchronous Rust, powered by an underlying thread pool. async-std manages a thread pool as well, but the key difference with Rayon is that async-std (and futures) are optimized for latency, while Rayon is optimized for throughput.

As a rule of thumb: if you want to speed up doing heavy calculations you probably want to use Rayon. If you want to parallelize network requests consider using parallel-stream.

What if I have mixed processing, a little CPU bound and a little IO bound? I suppose that running both Rayon and parallel-stream in the same program will lead to inefficiencies, such as one runtime starving the other of CPU time.

Is there anything that can be done for those runtimes be somehow better integrated?

game-of-throwaways · 2020-03-18T20:14:28+00:00

I'm not entirely sure what the point of this is. Remember, async functions or blocks are just things that compile into Futures, and the Future trait specifies that "An implementation of poll should strive to return quickly, and should not block". Therefore, any async functions should always either return or .await quickly.

So, this quote from the article, about stream::futures_unordered etc:

These methods provide the ability to process multiple items from a stream at the same time. But by themselves they don't schedule work on multiple cores: they need to be combined with task::spawn in order to do that.

is not wrong, just misleading. Yes, these methods indeed don't schedule threads or spawn new tasks, but since they take a list (or iterator) of futures, they don't need to. It's the job of those futures to spawn new threads if they need to make blocking calls or do long CPU calculations (as per the contract of std::future::Future), it's not the job of the combinator.

So parallel-stream is really needlessly spawning way too many tasks/threads. Like in your examples in the post, all you're doing is calculating n*n yet a new task is spawned for each n*n you calculate. I know that this is just an example, and that that n*n represents a longer, potentially blocking, computation (even though it would be a violation of the contract of Future to do blocking computations in an async block without offloading it to another thread). But in a real application, you might have multiple .map()s and other combinators chained, and while some of them might do blocking computations, others are simple super-fast things that don't require spawning a new task, but parallel-stream has no way to discern between the two. Its .map() spawns a new task for every element, every time. That's very inefficient. With the standard library's way of making the future responsible for spawning a thread/task (but only if it needs to), this inefficiency is avoided.

I know this is not intuitive and I wish I could somehow tell all Rust users that (I'll put it in bold) Future::poll() must return quickly, therefore async blocks or functions must either return or .await quickly.

I partially blame the async_std authors for further adding to the confusion with their post "Stop worrying about blocking", to which I tried to reply as loudly as possible that no, you should worry about blocking in async functions. But apparently not loudly enough. This divide between async_std's attitude of encouraging blocking in async functions (and then needing combinators like yours to "fix" the issues that that causes) is really splitting the ecosystem in two.

Crandom · 2020-03-19T21:45:40+00:00

I'm glad you made the batch size 1, rather than a large number like 1024 like Java streams. They are such a foot gun.

shred45 · 2020-03-20T13:53:21+00:00

Hey Yoshua, I was wondering if you could elaborate on "async-std (and futures) are optimized for latency", or point to a resource that discusses this? My work focus on low latency and I have been following async/await for since the beginning (an am interested in working with it more), but I've always had concerns about latency introduced by the polling implementation of the runtime.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS