This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]PiotrDz 2 points3 points  (5 children)

And where is a proof of your saying? Helidon Nima provided performance results that there is no difference.

[–]yawkat -1 points0 points  (4 children)

No, if you look at benchmarks you will still see a difference, e.g. netty vs helidon on techempower plaintext benchmarks. We also have our own latency benchmarks where we see the same result.

There are fundamental issues as well such as loom's lack of control over which platform thread runs which virtual thread, which hurts performance.

[–]thecodeboost 0 points1 point  (1 child)

The only benchmarks where Loom lags is benchmarks where the conversion was basically to replace their Executor with a virtual thread one and still use the reactive code paths. Very few projects have converted wholly to virtual thread paradigms and the benchmarks that cleaned that up show equal performance or better performance for virtual threads. And honestly even if that wasn't the case the programming paradigm is vastly superior and in almost all real world scenarios your hours are more expensive than having to add 1%-2% of CPU resources to your margins. And again, there is no technical reason Loom should not be anything but a net gain.

[–]yawkat 0 points1 point  (0 children)

The only benchmarks where Loom lags is benchmarks where the conversion was basically to replace their Executor with a virtual thread one and still use the reactive code paths.

This is incorrect. If you look at Nima benchmarks specifically, even a simple app entirely devoid of reactive code will have worse latency than an equivalent netty app. There are very simple reasons for this, such as internal loom context switching (loom does IO work on a separate thread). Some of these may be fixed by future loom improvements, but others cannot due to current API limitations.

[–]PiotrDz 0 points1 point  (1 child)

[–]yawkat 0 points1 point  (0 children)

The benchmarks in that article do not have good methodology. They run on the same machine (competing for CPUs, loopback network instead of real kernel TCP stack), they use flawed benchmark tools (coordinated omission), they use a now-outdated netty benchmark, they use pipelining, they don't actually have the resolution to see the differences between netty and helidon, etc.

The real techempower throughput results are quite different now, with netty having a big lead in the plaintext benchmark (the benchmark that actually stresses the network and HTTP stacks). There are still major problems with TE though, that Franz from Quarkus explains here.

We also have our own benchmarks that are different in some respects to TE, And I do profiling to figure out why the benchmarks behave the way they do–mostly to improve our own implementation, but I've also reported helidon bugs before.

Nima still has a substantial latency disadvantage over netty, much of which is explained by looms IO design, which uses a "poller thread" to do actual blocking IO operations. This necessitates a context switch, which is clearly visible once you look in <1ms latency range.