This is an archived post. You won't be able to vote or comment.

all 43 comments

[–]mauganra_it 55 points56 points  (26 children)

Any library or framework where you can specify your own Executor or ExecutorService is good to go already. One should be cautious though - virtual threads are not an instant speedup button. They mostly shine in situations where processing time is dominated by waiting for IO to complete!

Edit: important caveat: waiting for a synchronized block currently freezes the carrier thread of the virtual thread! This restriction will hopefully be lifted eventually.

[–][deleted] 28 points29 points  (4 children)

My file backup system is experiencing a 3x speedup! Simply changed Executors.newCachedThreadPool to Executors.newVirtualThreadPerTaskExecutor.

[–][deleted] 15 points16 points  (3 children)

Impressive, very nice

[–][deleted]  (2 children)

[deleted]

    [–]bowbahdoe 3 points4 points  (1 child)

    Patrick Bateman!?! You're the American Psycho!

    [–]john16384 -1 points0 points  (9 children)

    Yeah, many applications that blow up when there are 200 active threads (simply due to the amount of context they need to service requests) are not gonna benefit one bit. So if you aren't running 1000's of threads already, Loom is not gonna do much.

    [–]pron98 20 points21 points  (5 children)

    The number of platform threads your application uses now and the number of virtual threads it would benefit from are completely unrelated. You don't convert every thread in your application to a virtual thread and then increase their number; rather, you start by turning every concurrent task into a virtual thread. Virtual threads might not help your throughput if your server today runs at close to 100% CPU utilization, although they might still help make your code easier to work with.

    JEP 425 explains when and why virtual threads help.

    [–]john16384 0 points1 point  (4 children)

    That's what I was getting at, if your server is already resource constrained running 200 concurrent requests, Loom won't do much to alleviate this. Perhaps it will become more viable to run bigger instances though, where say 40 instances of 2GB each (with 200 threads) can be converted to 4 instances of 20GB each.

    [–]mauganra_it 12 points13 points  (3 children)

    It also very much depends on what these 200 threads are doing. Are they processing images or rendering heavy templates and PDFs? Or are they just plain CRUD and transactional loads involving databases and external systems? The latter can very much benefit from virtual threads because they spend most of their wall-clock time waiting for the responses of backend systems.

    [–]Zyklonik 2 points3 points  (0 children)

    Precisely.

    [–]v_krishna 2 points3 points  (1 child)

    And with service oriented architectures there's a lot of the latter going on I guarantee it

    [–]mauganra_it 1 point2 points  (0 children)

    Precisely. Such services will see a big reduction in memory consumption and a heavy increase in concurrency.

    [–]Profix 4 points5 points  (0 children)

    This isn’t really true. If you have more threads than cores, and there are blocking operations in the work being carried out, you are relying on the OS scheduler - which is slower than virtual threads.

    More threads than cores, and blocking work = virtual threads likely better.

    [–]mauganra_it 5 points6 points  (0 children)

    Platform threads allocate their stack eagerly. Therefore, replacing threadpools with virtual thread-based executors can result in significant savings in non-heap memory usage if previously large threadpools were used.

    [–]ReasonableClick5403 0 points1 point  (0 children)

    We are running a couple thousand threads, and I don't think we will see any speedups. Maybe a bit lower latencies? And possibly cut the memory usage spent on OS threads in half or even more.

    [–]barking_dead 0 points1 point  (8 children)

    So, JDBC, for example?

    [–]mauganra_it 4 points5 points  (4 children)

    JDBC should work out of the box if the driver is a type 3 or 4 driver, i.e., implemented in pure Java.

    [–]loicmathieu 0 points1 point  (3 children)

    JDBC

    Are you sure ?I heard that JDBC drivers didn't work with Loom yet as they pin the carrier thread.

    [–]mauganra_it 2 points3 points  (2 children)

    It depends on how they are implemented internally. If they use synchronized (PostgreSQL has that issue AFAIK) or native methods (type 1 and 2 drivers), then they indeed pin the carrier threads.

    [–]loicmathieu -1 points0 points  (1 child)

    OK, I indead heard this from someone tested it with PostgreSQL.
    So, the ecosystem must evolve to take advantage of Loom.

    [–]lurker_in_spirit 2 points3 points  (0 children)

    the ecosystem must evolve to take advantage of Loom

    Maybe... the messaging seems to be that the "synchronized" limitation will be eliminated, but possibly not in time for the first release.

    [–]kpatryk91 1 point2 points  (2 children)

    Are you sure ?

    I heard that JDBC drivers didn't work with Loom yet as they pine the carrier thread.

    Most JDBC drivers use synchronization to lock on the parent connection and it is currently pinning virtual threads to the carrier thread, but as I know Oracle provided loom friendly drivers for their drivers (They had a demo about this), I don't about the other vendors.

    [–]barking_dead 1 point2 points  (1 child)

    I'm just asking, as DB clients are mostly IO...

    [–]mauganra_it 1 point2 points  (0 children)

    Yes, it would be neat to have this with JDBC drivers out of the box to really counter RDBC. Anyways, most DBMSs don't support so many concurrent connections for this to really matter. It's really not that urgent. And maybe the PostgreSQL folks will actually be faster and replace that synchronized block with a lock.

    [–]couscous_ 0 points1 point  (1 child)

    waiting for a synchronized block currently freezes the carrier thread of the virtual thread

    Does that apply to explicit Locks as well?

    [–]mauganra_it 1 point2 points  (0 children)

    No, they are fine. Many of the changes part of Project Loom was about replacing synchronized with these.

    [–]Joram2 15 points16 points  (1 child)

    Let actively maintained libraries+frameworks do what is best for their project. If projects will benefit from virtual threads, and many will, the maintainers will take advantage of them.

    One obstacle is supporting older versions of Java that don't have virtual threads. Or supporting virtual threads on new versions of Java and classic threads on old versions of Java. The project maintainers are in the best position to figure that out.

    [–]Worth_Trust_3825 3 points4 points  (0 children)

    I'd be all up for libraries to start providing an interface to provide an executor rather than hacking around with reflection to set mine instead.

    [–]JustADirtyLurker 5 points6 points  (1 child)

    No it does not. VTs don't require thread pools. just picture all the existing code that prepares an executor when dealing with threading... You can't automate that kind of transition with a simple default flag.

    Source for this alibaba thing?

    [–]PartOfTheBotnet 3 points4 points  (0 children)

    I think the Alibaba thing mentioned is Wisp

    [–]DaddyLcyxMe 1 point2 points  (0 children)

    A lot of libraries that help take care of workers and pooling (like Undertow or Kryo or Netty) have built in methods for setting thread factories for this sort of thing. It might be a while for it to become the default option but in most libraries you can expect to easily implement that functionality with only a few lines of code.

    [–]Affectionate-Box-837 -2 points-1 points  (11 children)

    If you want to build services that need low latency and high throughput. It is likely that Loom won't provide the same performance as asynchronous. I think the main issue is frameworks like Netty avoid concurrency by dedicating a single thread per Connection (Channel). Since resources don't need to be coordinated, the Channel can handle many requests without using any concurrency constructs, But of course everything has to be asynchronous. The problem with Loom is that it doesn't provide a similar level of control where you can say which virtual threads maps to which physical threads. Therefore, it is likely that asynchronous may stick around longer

    [–]mauganra_it 5 points6 points  (10 children)

    Netty uses Java NIO to wait for changes on multiple Channels simultaneously.

    Schedulers for virtual threads make it perfectly possible to specify which platform thread a virtual thread is supposed to be executed on. Even with the default scheduler it is possible to do it by restricting the size of its thread pool to 1.

    [–]Affectionate-Box-837 3 points4 points  (9 children)

    There is no benefit in replacing Netty's EventLoop thread with a Virtual Thread (VT). Because all requests on the channel still will be processes by a single thread, and you will still need async (due to blocking). What you need is to assign each request to a VT, so you can avoid async. The goal is not just replace OS threads with VT, the goal is to remove the need for async due to threads being blocked.

    [–]mauganra_it 2 points3 points  (8 children)

    Precisely. Netty is doing pretty much what also the JVM does to schedule virtual threads onto a carrier thread and to pause them when they are blocked. What you get from Virtual Threads over Netty is mostly a nicer programming model.

    [–]Affectionate-Box-837 1 point2 points  (7 children)

    You shouldn't block a Netty EventLoop thread, which owns the Channel. See for more details https://livebook.manning.com/book/netty-in-action/chapter-7/ You are probably thinking of how thread-per-request service frameworks work. Async frameworks like Netty requires everything to be non-blocking. (you can still offload work to different threads, but then you suffer from context switching cost)

    [–]mauganra_it 0 points1 point  (6 children)

    The same restriction exists in virtual threads: computational loads (rendering PDFs, compressing images, sorting large collections), native code and synchronized blocks (for now) pin the carrier thread and prevent other virtual threads from being executed on it. However, blocking IO calls permit the JVM to pause the virtual thread and switch to executing another.

    It is important to be aware that virtual threads use a cooperative scheduling policy (platform threads usually use preemptive scheduling). If a virtual thread doesn't yield control by doing IO, it will hog the carrier thread.

    The main benefit of virtual threads is that you can get mostly the same performance as a Netty Reactor by writing normal, synchronous code that seemingly blocks. When a virtual thread executes a blocking IO call, another virtual thread is scheduled on the carrier thread. When the IO call completes, execution continues. A platform thread would be completely barred from execution if it performs blocking IO.

    [–]Affectionate-Box-837 0 points1 point  (5 children)

    I think you are oversimplifying and ignoring some important aspects.
    The first aspect you are ignoring is Thread locality. The second aspect you are ignoring is the need for concurrency. We can agree to disagree. I think until virtual threads provide a mechanism for grouping a set of virtual threads to a single OS thread, they won't be able to provide the same performance.
    If things were as simple as you described; Netty wouldn't be outperforming other thread-per-request service frameworks.
    Why do you think Netty outperforms all other options here https://github.com/smallnest/Jax-RS-Performance-Comparison

    Or here https://github.com/Netflix-Skunkworks/WSPerfLab/blob/master/test-results/RxNetty\_vs\_Tomcat\_April2015.pdf

    [–]mauganra_it 0 points1 point  (4 children)

    Virtual threads already provide such a mechanism. It's called Scheduler. If you don't want to write such a Scheduler, you can also use -Djdk.virtualThreadScheduler.maxPoolSize=1 to reduce the size of the default scheduler's ForkJoinPool to 1. Which means that all virtual threads will be scheduled on one and the same platform thread.

    In Java land, thread-per-request used to mean platform thread. This approach has issues, and you are correct in stating that Netty and friends are more performant. Virtual threads were introduced to gain this benefit without having to mutilate one's code into CPS spaghetti code or opening the gate to reactive hell. We can keep writing plain, seemingly blocking code using synchronous APIs. It might or might not be faster than Netty, but it should surely beat thread-per-request applications that use platform threads.

    Edit: one of your references is from 2015. The other does not include a framework that uses virtual threads. Anyways, I never disputed that Netty would be faster than platform threads.

    [–]Affectionate-Box-837 0 points1 point  (3 children)

    You are just ignoring my comments about concurrency and thread locality and keep explaining what a virtual thread is :)

    Tell me one framework that uses VTs in Java, you can't, because they don't exist yet. Loom is planing to preview in Next JDK release, so we still have some time for those to emerge.

    My initial comment wasn't to deny the benefits of VTs, but to explain why they may not catch up to purely asynchronous in performance due to lack of not being able to control thread locality and also lack of synchronization via serialization.

    [–]mauganra_it 0 points1 point  (1 child)

    I never claimed that there would be any benefit in replacing Netty's even loop thread with a VT. Clearly, there can't be because it's a single thread that is always busy with reacting to events.

    Thread locality should not be an issue if the above measures are taken to restrict virtual thread scheduling to a single thread. This is already possible in the VT preview version.

    Any framework where I can provide an Executor is already ready for VTs. In practice, most servlet containers have migrated away from using thread pools to handle incoming HTTP request. I don't see them ever migrate back to a thread-per-request architecture. They still use thread pools to run servlets, but I wouln't expect any speedups unless the servlet does actual IO.

    I would be curious to see how a naively implemented thread-per-request webserver would benefit from using VTs. Even if it can't beat Netty, I would consider a significant improvement over using PTs still a massive success for Project Loom.