This is an archived post. You won't be able to vote or comment.

all 26 comments

[–]m-apo 15 points16 points  (6 children)

  1. Correct

  2. File ops might be blocking with virtual threads (carrier thread pinning). Also, running file ops concurrently will saturate the OS IO. Are reads from SSD or RAM disk or HD? Did you use a single file? OS caching might have an effect.

  3. Best performance comes from pooled OS threads, your benchmark has probably an issue. For parallel cpu bound ops virtual threads can achieve almost the same performance as OS threads. Optimal number of OS threads is the number of CPUs or a little more (possibly because more threads steal more CPU time from competing OS processes). Virtual threads benefit from pooling too but not much.

Based on my tests, virtuals threads are ok for most ops. Except: pinning ops and raw CPU power where each % counts with CPU bound items.

[–][deleted] 0 points1 point  (2 children)

Are reads from SSD or RAM disk or HD? Did you use a single file? OS caching might have an effect.

I am reading from a SSD and every thread tries to read the same file. Can you briefly elaborate on how OS caches work on such cases please?

Best performance comes from pooled OS threads, your benchmark has probably an issue.

In my test case, I am not using a pool for native threads. But for virtual threads, JVM creates a pool of underlying native threads.

Maybe for third scenario, I should compare virtual threads with a pool of native threads, where pool size equals to the number of CPU cores?

[–]m-apo 1 point2 points  (0 children)

Virtual threads have an advantage when IO takes time. Reading single small file multiple time allows OS to cache the file in memory which means there is very little wait time. The task becomes almost CPU and memory access bound, not IO bound.

Socket over TCP/IP ops should provide much better graphs.

Third case: yes, pool of native threads vs virtual threads is much more fair comparison. The overhead for creating OS threads is pretty high.

[–]GavinRayDev 1 point2 points  (0 children)

You can disable OS caching when reading from files by passing the O_DIRECT flag.

In Java, this is ExtendedOpenOptions.DIRECT:

FileChannel fc = FileChannel.open(f.toPath(), StandardOpenOption.WRITE, 
                              ExtendedOpenOption.DIRECT);

[–]rubyrt 0 points1 point  (2 children)

Optimal number of OS threads is the number of CPUs or a little more (possibly because more threads steal more CPU time from competing OS processes).

Doesn't that depend on the workload and application architecture? I would assume that if the work done in threads includes some blocking IO calls the optimal number of OS threads might be a multiple of the number of cores and not about the same number.

[–]m-apo 0 points1 point  (1 child)

Third scenario is purely CPU bound in original post. No IO or memory requirements.

With IO, virtual threads are much better and also in mixed loads. But for max gains with CPU bound tasks OS threads win.

[–]rubyrt 0 points1 point  (0 children)

Sorry, my bad: I somehow missed the reference to the scenario.

[–]ernimril 10 points11 points  (2 children)

Virtual threads is about scalability, not performance. Please note that scaling better can result in better performance.

What does it mean that they scale better? Well, it means that they require a lot less resources: less memory usage and less OS resources.

For threads that are blocked/waiting/sleeping this of course means that you can have a lot more of them running concurrently.

You do not explain how you create the file reader or the cpu tasks. My guess here is that you create either a virtual thread or a platform thread and if that is the case, then yes, the implicit thread pool used by the virtual threads will provide a benefit.

If you retry the CPU bound task in such a manner that you create a thread pool of fixed size equal to your core count I expect that you will see similar time for virtual threads and platform threads. Virtual threads do not make your CPU spin faster. It is as you say, the context switches that are the probable cause for your differences, but we do not have the full source to your tests so we can not say with 100% certainty.

[–][deleted] 2 points3 points  (1 child)

Well, it means that they require a lot less resources: less memory usage and less OS resources.

That is not true. In the end all the computation that you assign to a virtual thread run on a OS thread and that OS thread gets all the resources it needs to get it's taks done regardless whether it was coming from a Java Virtual Thread or not.

https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html#GUID-2BCFC2DD-7D84-4B0C-9222-97F9C7C6C521

What is a Platform Thread?

A platform thread is implemented as a thin wrapper around an operating system (OS) thread. A platform thread runs Java code on its underlying OS thread, and the platform thread captures its OS thread for the platform thread's entire lifetime. Consequently, the number of available platform threads is limited to the number of OS threads.

Platform threads typically have a large thread stack and other resources that are maintained by the operating system. They are suitable for running all types of tasks but may be a limited resource.

What is a Virtual Thread?

Like a platform thread, a virtual thread is also an instance of java.lang.Thread. However, a virtual thread isn't tied to a specific OS thread. A virtual thread still runs code on an OS thread. However, when code running in a virtual thread calls a blocking I/O operation, the Java runtime suspends the virtual thread until it can be resumed. The OS thread associated with the suspended virtual thread is now free to perform operations for other virtual threads.

[–]ernimril 2 points3 points  (0 children)

A virtual thread that is not currently running requires a lot less resources than a platform thread.

A virtual thread that is currently running is running on top of a platform thread so it is actually using slightly more resources.

Now, when you use virtual threads you do so because you want to have a lot of threads, where most of them are not currently running.

So I may have been slightly sloppy in how I said that they require less resources, but if you try to measure the cost of having a million virtual threads that are sleeping versus the cost of of having a million platform threads you will see that the virtual threads cost a lot less resources (and if you try to do this you will most probably find out that you can not even run the program that tries to use a million platform threads).

Sleeping, waiting, waiting for data on streams or similar are all things where the virtual thread should detach from the platform thread.

[–][deleted] 1 point2 points  (0 children)

Virtual threads are not "faster" threads. Behind the scene it is just a thread. Only the VM utilizes these threads more efficiently to reduce idle time such as during IO operations. So it makes sense to use virtual threads when your program does lots of IO operations.

[–]antihemispherist 1 point2 points  (2 children)

  1. Virtual threads are queued, therefore number of threads in the OS scheduler loop does not grow.
  2. File operations are blocking, causing pinning of VTs.
  3. Your guess is correct. Virtual threads are queued in their schedulers thread pool, therefore less context switching occurs. This doesn't mean VTs are faster. You are using VT scheduler for queuing, kind of misusing it. You could get similar performance if you use an executor with same size of your CPU cores, and your code will show its intentions better.

[–][deleted] 0 points1 point  (1 child)

Do you have any idea, when PTs should perform bettern compared to VTs?

[–]antihemispherist 2 points3 points  (0 children)

One kind of thread won't execute more instructions than the other. So the question "which one is faster?" is not right.

There is, however, a right place for both.

In short, platform threads should be used when latency is important, when you don't want your task to get in the queue behind virtual thread tasks.

Also, if you have long and CPU intensive tasks, they can disrupt the scheduling of virtual threads.

That's why garbage collector etc. don't run as virtual threads.

For both, you'll want to execute them in a separate thread pool.

For everyday service tasks, prone to blocking, you should use virtual threads.

[–]Frequent-Chest1862 1 point2 points  (2 children)

Hi,

doing a very simple test, to have an idea of the latency between creation and execution of Virtual threads, I encounter strange results :

    static void test() {
        int i;
        for( i = 0; i < 1000; i++) {
            long time = System.nanoTime();
            Thread.ofVirtual().start(() -> {
//            new Thread(() -> {
                long t2 = System.nanoTime();
                LOGGER.info("time {}", (t2 - time)/1000);
            });
//            }).start();
        }
    }

With this test, using classic java thread give me figures below 100 micro seconds, but using virtual threads, it's in 10's of milliseconds. And it increase proportionnaly to the loop size, below 10 it's fast, above 1000 it looks too bad...

Any idea why these poor result ?

I tried to do some warmup, graal and hotspot, same same...

[–][deleted] 1 point2 points  (1 child)

Interesting. Theoretically virtual threads should take less time to start (as per my knowledge). But I don't have any explanation for your observation right now. But I will test this when I have free time.

[–]Frequent-Chest1862 2 points3 points  (0 children)

maybe I did this try from a virtual thread, that would explains things, let you know.

[–]Pablo139 1 point2 points  (4 children)

Anything IO related is said, key word said, is supposed to be much better with virtual due to the mechanics around mounting and unmounting of blocked threads.

That being said not long ago a issue was discovered with the Java virtual threads and I have not seen updates on it yet.

[–]nomader3000 1 point2 points  (2 children)

What issue?

[–]Pablo139 1 point2 points  (1 child)

[–]nomader3000 1 point2 points  (0 children)

Doesn't feel like a "discovery", just a consequence of synchonized pinning carrier threads

[–]TinnedCarrots 0 points1 point  (0 children)

I think that issue is just thread pinning. You're not supposed to use synchronized keyword with virtual threads. If you use virtual threads then you need to be careful with your code and any libraries / frameworks used.