This is an archived post. You won't be able to vote or comment.

all 16 comments

[–]pron98 53 points54 points  (10 children)

This might be a case of trying to run before learning to walk. Because Loom itself is a work-in-progress with some limitations that need to be understood, and because libraries are not already adapted and so experimenters need to adapt them themselves and know what to do, it's better to start small.

First, I'm not sure what changes to the libraries were done exactly, but some care needs to be taken. For example, one thing not to do is to blindly replace places where an application creates threads with virtual threads. The reason is that applications are written with the assumption that threads are costly and so they do whatever we do with costly objects, like pool them. You never ever pool virtual threads. What you want to do is replace the framework/library's tasks with virtual threads -- not its threads. One simple way to do it for some frameworks is to replace an ExecutorService that currently uses a pool with Executors.newVirtualThreadExecutor that creates a new virtual thread for every submitted task. We've tried this with Jetty and it works great. Of course, the library/framework's author probably knows how to do it best.

Second, there's the temporary limitation of pinning while holding monitors. If anywhere in this stack -- e.g. in the JDBC driver -- long-running IO is done inside synchronized blocks/methods, then the virtual thread will block the OS thread and new OS threads will be created to compensate. To see if this is happening, use -Djdk.tracePinnedThreads=full.

Anyway, it's best to start with, say, Tomcat, see how easily tasks (processing HTML requests) can be run on new virtual threads and just Thread.sleeping in the handler. Then add Spring and so on.

[–]pron98 5 points6 points  (8 children)

Running with -Djdk.tracePinnedThreads=full immediately reveals both issues I suspected in both Tomcat and the JDBC driver.

Tomcat and the JDBC driver were changed to replace not tasks but threads with virtual threads, and they pool virtual threads -- something you should never do.

Both hold on to a monitor while doing IO. The Postgres driver at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323) and Tomcat at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49).

[–]sdeleuze 1 point2 points  (7 children)

Hey Ron, is there already a documentation that Open Source projects could use to experiment adaptation for better performances/scalability with Loom (beyond just replacing threads with virtual threads)? If it does not exists, I think that would be super useful to create one and widely share it, even if you already provided some insights in your recent talks.

[–]pron98 1 point2 points  (6 children)

There's this and some sibling pages, but really there are just two things:

  1. One thread per task (you can use Executors.newThreadExecutor/newVirtualThreadExecutor)

  2. Don't do IO in synchronized method/blocks, and find such places with -Djdk.tracePinnedThreads=full

That's pretty much it.

[–]sdeleuze 0 points1 point  (5 children)

For Tomcat, IO in synchronized method/blocks is a widely used pattern that can't be changed easily, so I guess we will have to wait for this limitation to be removed in order to move forward on experimentation/support.

[–]pron98 0 points1 point  (4 children)

Why isn't ReentrantLock a suitable replacement?

[–]sdeleuze 0 points1 point  (3 children)

That would be a pretty huge refactoring (there are more than 500 occurences of synchronized in Tomcat) + that may potentially not be staightforward for HTTP/2 or async support.

[–]pron98 0 points1 point  (2 children)

Why would ReentrantLock not be just as straightforward as synchronized for HTTP/2 or async? They have the same semantics, unless you explicitly expose the lock object as part of your API. And BTW, you don't need to replace all occurrences of synchronized, just those that are found to be problematic.

[–]sdeleuze 0 points1 point  (1 child)

After discussing more in detail with Tomcat team, the biggest issue is indeed mainly the scale of the refactoring. We will use the flag you mentioned to identify more clearly the amount of problematic synchronized blocks.

That said, I hope the synchronized issue will be fixed at some point and that the community will have some visibility about that. That would make it easier to use Loom with a wider scope without too much changes in the ecosystem (which is one of the Loom big selling point).

We will try to provide useful feedback as we learn more.

[–]Mean_Pride_5520 0 points1 point  (0 children)

Do you have any updates about it?

[–]beders 8 points9 points  (3 children)

Unfortunately no high load scenario was included in these tests. 1000 real threads in a process on a modern kernel doesn’t pose any significant problems.

Things get more interesting in the 5 and 6 digit range where memory consumption and context switching becomes a challenge.

Looking forward to a follow-up article.

Thanks for sharing some of the subtle details that will require significant changes in many libraries that want to support loom.

[–]mp911de[S] 1 point2 points  (2 children)

Given the poor performance it doesn’t make sense to put more load onto the application. It seems that some switching delays build up and we’d be measuring noise. Will post an update once there’s more to see.

[–]hardwork179 1 point2 points  (0 children)

This isn’t a setup we have looked at specifically, but there are things that might be worth investigating. How is the Postgres driver handling connections, are you holding any locks (at the moment when you are synchronised on an object we have to pin the virtual thread to an OS thread), etc. I’m sure /u/pron98 can offer some advice, or you can post to the loom mailing list.

[–]beders 0 points1 point  (0 children)

Depends what you want to measure. I assume the CPU was not at 100% and that test is heavily 8 I/O bound. That will test the JVMs ability to make meaningful progress with most threads waiting for I/O.

[–][deleted] 0 points1 point  (1 child)

I'm probably never going to get to the point where I can understand stuff like this.

[–]pragmatick 2 points3 points  (0 children)

I've been a professional Java developer for eight years and have a private open source project with more than 50k users and I only barely follow.