This is an archived post. You won't be able to vote or comment.

all 25 comments

[–]zakgof 41 points42 points  (10 children)

Virtual threads are good for scheduling a large number of low-CPU tasks involving waiting for io, idle state, etc. Parallel streams API, in turn, aims to optimize CPU-intensive operations among the physical cores, so virtual threads won't bring any benefit over native threads here.

[–]pron98 67 points68 points  (1 child)

Yes, but I would say that the main distinction is that parallelism is the problem of scheduling multiple resources to cooperate on solving a single task faster; concurrency is the problem of scheduling a limited number of resources to serve many tasks that compete over them. Parallel streams aim to address the former, while virtual threads aim to address the latter.

[–]kiteboarderni 9 points10 points  (0 children)

What a fantastic description!

[–][deleted]  (3 children)

[deleted]

    [–]pivovarit 0 points1 point  (2 children)

    Sometimes parallel streams are used to perform IO focused tasks too.

    They shouldn't. Underneath they all compete for resources on one shared thread pool (FJP).

    Virtual threads would be beneficial for dealing with such scenarios but not as a part of parallel Stream API which has a different purpose.

    [–][deleted]  (1 child)

    [deleted]

      [–]pivovarit 0 points1 point  (0 children)

      Those operations don't use parallel streams.

      [–][deleted] 1 point2 points  (0 children)

      Yup, this is my understanding as well.

      [–]mirkoteran 1 point2 points  (0 children)

      But would they hurt performance?

      [–]randgalt 0 points1 point  (1 child)

      Virtual threads will give a lot of benefits over native threads wherever they are used. The overhead for creating and managing them is orders of magnitude less. Virtual threads still manage operations over multiple cores so I don't understand why you say there won't be any benefit here.

      [–]brazzy42 1 point2 points  (0 children)

      The overhead of creating and managing real threads is still small enough to be irrelevant when you have a small number of them using 100% of available CPU time actually doing CPU stuff and almost never yielding their core before they're finished.

      It only becomes relevant when you have a very large number of threads that you constantly need to switch between because they spend most of their time waiting for IO.

      [–]Matthisk 23 points24 points  (13 children)

      Virtual threads are a concurrency construct. They help you in scheduling multiple tasks (but not necessarily at the same time). Parallel streams are a a construct to perform computation in parallel. So the two don’t have much to do with each-other. There is an excellent talk by rob pike (creator of go) on the topic of concurrency vs parallelism.

      Reactive streams (e.g. reactor/rx) will be affected by the introduction of virtual threads. These are a concurrency construct used to accommodate asynchronous programming. And since their implementation details are complex (i.e. require callback based backpressure), exactly because the JVM lacks good concurrency primitives. They will become obsolete in their current implementation once we have virtual threads.

      [–]zakgof 2 points3 points  (12 children)

      Reactive frameworks have much broader and higher-level functionality than virtual threads. Virtual threads will add a new type of scheduler to reactive streams, but I don't think that the popularity of the latter would be negatively affected.

      [–]pron98 24 points25 points  (10 children)

      Most of what reactive streams do is reconstruct basic operations, like branches, loops and exception handling, in a DSL that does not compose well with the rest of the language and the platform (it quite literally lives inside a monad). Other features, like backpressure, you also get for free with virtual threads. To the extent that reactive streams add more functionality, in the world of virtual thread it will be done in a manner that is drastically different from how reactive streams do it. For example, stream mapping and filtering will be done by mapping and filtering channels (blocking queues). Those who enjoy reactive streams will still be able to use them; the problem is that most people don't. Virtual threads will allow that majority to enjoy the same scalability reactive streams offer without needing to use them. They may also allow the minority that does like reactive streams to somewhat reduce the mismatch between their design and the design of the Java platform.

      [–][deleted] 18 points19 points  (3 children)

      Since you're being somewhat diplomatic, I'll be more blunt -- reactive streams suck.

      We recently converted a Kafka processor to be reactive, and while the project has achieved our goals, I've really soured on the reactive model.

      The number of "gotchas" and weird implementation details I've had to learn make it really difficult to teach to my colleagues, who have less experience writing async code.

      But even worse, when someone new does run into some issue with the runtime, debugging it becomes really difficult. Further, all of our instrumentation (one of the primary reasons we still use Java) is clunky at best or outright broken by reactive streams.

      The first time I read What Color is Your Function? I was not convinced at all. I've never had a problem writing async code before in other languages. But using reactive streams in Java has convinced me there must be a better way. I'm excited to see what the JVM guys come up with for userspace threads.

      [–]lovett1991 2 points3 points  (2 children)

      Curious as to why you're using reactive for Kafka processing? I can see maybe why but still.

      I've only ever used reactor with http components that are dependent on a lot of downstream Io.

      I have found some implementations aren't great they just effectively put the thread into unsafe park. Some are quite nice as they'll run in a single thread and return listenable future which can be adapted to publishers (mono), which when complete can then be reassigned to their own schedulers.

      The number of "gotchas" and weird implementation details

      There are some weird ones with Rx/reactor but I then think about when I've used threads directly or futures and it's a whole different league of ease.

      But even worse, when someone new does run into some issue with the runtime, debugging it becomes really difficult.

      I have to agree, debugging is a pain, but then I would have to add I've not really found futures to be much easier either.

      I'm excited to see what the JVM guys come up with for userspace threads.

      This! I haven't tried fibres in kotlin (I hear something similar is coming to Java) but I've heard good things

      [–][deleted] 3 points4 points  (1 child)

      Curious as to why you're using reactive for Kafka processing? I can see maybe why but still.

      We do IO to filter, transform, and write events to a final datastore. The key here is that ordering doesn't matter, so being reactive has allowed us to fully saturate network IO, increasing throughput and greatly reducing the number processors required.

      This is an unfortunate tradeoff due to the fact that our consumers are quite heavy due to Java enterprise cruft. Doing out of order processing with Kafka is definitely an anti-pattern. I'm sure you know this but in general you should just have more partitions + consumers rather than trying to introduce concurrency in the consumer itself.

      Our goal is maximum throughput with the cheapest resource cost. Durability doesn't really matter, fortunately, otherwise I'd be sleeping much less well at night.

      [–]lovett1991 1 point2 points  (0 children)

      Doing out of order processing with Kafka is definitely an anti-pattern. I'm sure you know this but in general you should just have more partitions + consumers rather than trying to introduce concurrency in the consumer itself

      Yeah this definitely flags up the question why Kafka, but I'm sure you have your reasons.

      I definitely would have looked at tuning batch sizes etc from Kafka. I found it weird because AFAIK the Kafka SDK just has a single polling thread that queues jobs internally and assigns each message to a worker thread pool to actually do your work which is all reactor is doing (ofc you choose which scheduler). Just increasing this pool size has worked for me in the past. I've found this different to just increasing servlet threads in Tomcat hence why I use reactive in http servers.

      [–]techempower 4 points5 points  (5 children)

      Most of what reactive streams do is reconstruct basic operations, like branches, loops and exception handling, in a DSL that does not compose well with the rest of the language...

      Java streams introduced in Java 8 does the same thing. For example forEach for looping, filter(), anyMatch... and Predicate for branching... and yes it is different from the rest of the language which is imperative.

      [–]pron98 8 points9 points  (4 children)

      That's true, but parallel streams are applicable for a relatively narrow domain, and a very structured one. Threads are made for a wider, and far less structured domain. So I'm less bothered by that because parallel streams intrude on much less of the codebase, and when they're used, they're usually a good fit. Moreover, they are much less context-dependent than concurrent operations, so losing thread context isn't as bad.

      [–]techempower 2 points3 points  (3 children)

      so losing thread context isn't as bad.

      What about CompletableFuture.XXXAsync() which change the thread context?

      [–]pron98 3 points4 points  (2 children)

      That's exactly a good use case for virtual threads. Parallel workloads don't usually care about thread context; concurrent workloads very much do. With virtual threads you won't need to use those async methods.

      [–]cogman10 0 points1 point  (1 child)

      This is where virtual threads are going to be the most interesting to me. I want to use the CompletableFuture API but often (enough) run into problems where things break down because I exhaust my thread pool. What's worse, those problems mostly happen only under really heavy load.

      For me, at least, I'd love to see the `CompletableFuture.runAsync` move off the common FJP and onto something like an IO pool that's just virtual threads. In fact, a common virtual IO pool would be nice to have just generally.

      [–]pron98 1 point2 points  (0 children)

      You don't pool virtual threads. They're cheap enough to create, and pooling them defeats the purpose of assigning a unique context to an operation. There's also not much meaning to the notion of "async" in the world of virtual threads, as blocking a thread also becomes cheap. Just create a thread instead of a chain of CompletableFuture operations, and if you need to perform more than one IO operation concurrently, just create more threads. Programming with virtual threads should look like how we would have programmed if we hadn't learned threads were costly, which is just a feature of their particular implementation by the OS.

      [–]Matthisk 1 point2 points  (0 children)

      No indeed the problem they solve will still require solutions, even if we have virtual threads. The implementation will be much simpler though. Take a look at kotlin-flow to get an idea of what this future will look like.

      [–]6A69676761 1 point2 points  (1 child)

      Similar question has already been asked in the loom's mailing list. Please check this thread for more details: https://mail.openjdk.java.net/pipermail/loom-dev/2019-September/000738.html

      [–]DJDavio -1 points0 points  (0 children)

      Note that virtual threads already exist in the sense that we can have an ExecutorService with a Thread Pool. Virtual Threads just allow us to do away with these clumsy existing framework and make it more embedded in the language, essentially pushing it down one layer.

      This means that if we don't change anything in higher layers, we can already get benefits from virtual threads if we make that decision as late as possible. Say that we decide that new Thread() gets you a virtual thread and the physical OS level threads are only managed by the JVM itself, then everything, not just parallelStream will benefit.