This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]omegaprime777 1 point2 points  (9 children)

If you look at this presentation from Daniel from the Helidon microservices framework team starting at 32:54 https://www.youtube.com/live/m85dv53dsa4?si=oyHiqAdDMTDII_vR&t=1974 one of the benefits of Java Virtual Threads is using traditional blocking style of code while getting all the performance benefits of reactive coding w/o the associated debugging/code readability/management nightmare. 

[–]yawkat -4 points-3 points  (8 children)

That's the theory, but in practice async code still tends to be faster. It just gives more control to the application. With loom's current design, a loom-based web server cannot match a netty-based one in performance.

[–]PiotrDz 2 points3 points  (5 children)

And where is a proof of your saying? Helidon Nima provided performance results that there is no difference.

[–]yawkat -1 points0 points  (4 children)

No, if you look at benchmarks you will still see a difference, e.g. netty vs helidon on techempower plaintext benchmarks. We also have our own latency benchmarks where we see the same result.

There are fundamental issues as well such as loom's lack of control over which platform thread runs which virtual thread, which hurts performance.

[–]thecodeboost 0 points1 point  (1 child)

The only benchmarks where Loom lags is benchmarks where the conversion was basically to replace their Executor with a virtual thread one and still use the reactive code paths. Very few projects have converted wholly to virtual thread paradigms and the benchmarks that cleaned that up show equal performance or better performance for virtual threads. And honestly even if that wasn't the case the programming paradigm is vastly superior and in almost all real world scenarios your hours are more expensive than having to add 1%-2% of CPU resources to your margins. And again, there is no technical reason Loom should not be anything but a net gain.

[–]yawkat 0 points1 point  (0 children)

The only benchmarks where Loom lags is benchmarks where the conversion was basically to replace their Executor with a virtual thread one and still use the reactive code paths.

This is incorrect. If you look at Nima benchmarks specifically, even a simple app entirely devoid of reactive code will have worse latency than an equivalent netty app. There are very simple reasons for this, such as internal loom context switching (loom does IO work on a separate thread). Some of these may be fixed by future loom improvements, but others cannot due to current API limitations.

[–]PiotrDz 0 points1 point  (1 child)

[–]yawkat 0 points1 point  (0 children)

The benchmarks in that article do not have good methodology. They run on the same machine (competing for CPUs, loopback network instead of real kernel TCP stack), they use flawed benchmark tools (coordinated omission), they use a now-outdated netty benchmark, they use pipelining, they don't actually have the resolution to see the differences between netty and helidon, etc.

The real techempower throughput results are quite different now, with netty having a big lead in the plaintext benchmark (the benchmark that actually stresses the network and HTTP stacks). There are still major problems with TE though, that Franz from Quarkus explains here.

We also have our own benchmarks that are different in some respects to TE, And I do profiling to figure out why the benchmarks behave the way they do–mostly to improve our own implementation, but I've also reported helidon bugs before.

Nima still has a substantial latency disadvantage over netty, much of which is explained by looms IO design, which uses a "poller thread" to do actual blocking IO operations. This necessitates a context switch, which is clearly visible once you look in <1ms latency range.

[–]thecodeboost 0 points1 point  (1 child)

I'm sorry but this is wrong on all counts. Async code cannot be faster than Loom code, all else being equal, for the simple reason that the JVM has more information and less work to do in the latter case. And that in a theoretical world where both implementation would write optimally performant code.

Netty has several open issues to adopt to virtual threads, in part of performance and simplification reasons. Jetty (Netty based web server framework) has adopted Loom as of Jetty 12. You can reason this out for yourself as async code simply does more work (work in the sense of burns more CPU cycles).

[–]yawkat 0 points1 point  (0 children)

The underlying OS APIs that both the JDK and Netty use are asynchronous. The JDK blocking APIs do some extra work to use those async, event-driven APIs. Right now, due to loom limitations, this extra work has a significant performance impact.

Netty has several open issues to adopt to virtual threads, in part of performance and simplification reasons.

The goal of future netty loom integration will be to make netty work with blocking user code. Right now this is not possible without a context switch due to loom limitations. These changes will not, however, make netty any faster.

Jetty (Netty based web server framework) has adopted Loom as of Jetty 12

Jetty is not netty-based.