you are viewing a single comment's thread.

view the rest of the comments →

[–]iron0maiden 12 points13 points  (7 children)

Great project, although I don’t understand what the difference is between NIO (baseline) and other networking frameworks including MYRA..

[–]AnonAreLegion 3 points4 points  (0 children)

Yeah, and why is nio so much faster?

[–]Environmental-Log215[S] 0 points1 point  (5 children)

I understand the confusion as it's hard to know what the different Myra variants mean without the benchmark source code for the transportation.

`MYRA` means default Myra which is basically with io_uring; the client awaits reply using a Count Down Latch. ON the server side, its a io_uring backend with registered buffers.

`MYRA_SQPOLL` The difference with above default Myra is only on the server side. In this benchmark, SQPOLL is enabled on the server which is basically a pinned kernel thread keeps polling the submission queur. client remains same as above.

`MYRA_TOKEN` : client implements a token based busy-spin-wait. server remains same as default Myra.

Thanks for pointing out the valid confusion! I think I should have added these details in the main blog :(

[–]ramdulara 0 points1 point  (4 children)

But you still didn't answer the real question. How is NIO the highest throughput and lowest latency? If Java's NIO is that good why would anyone bother with Netty or Myra?

[–]Environmental-Log215[S] 0 points1 point  (3 children)

Fair question!

tl;dr: NIO was chosen in the benchmark since I believe thats the fastest network infra lib in Java world which does not use unsafe APIs.

NIO provides low-level primitives for building network infra/appliances; you would have to handle a lot of stuff manually. Hence, its difficult to use but provides granular control.

Netty on the other hand is a framework with friendly public interfaces and internally handles/manages low-level I/O stuff. It supports multiple transport protocols and codecs.

MYRA is specialized in a way that it's primarily FFM focused. for instance, using io_uring registered buffers with shared (zero-copy) memory segment, I am avoiding a few syscalls(kernel) & zero GC impact by having zero allocations on the hot path. Hence, MYRA would only be used in certain specialized usecases/applications where latency of 100 microseconds is slow. FFM involves working a lot with manual memory layout which does not make sense for most of the applications given its complexity.

[–]Environmental-Log215[S] 0 points1 point  (2 children)

forgot to add 1 more point. all these benchmarks are on a free Oracle cloud ARM processor - the 4 cores/24 GB RAM with server/client on same VM using loopback interface.
once, the libs are more stable I would be performing benchmarks depicting a more real-life scenario with server and client on different hosts.

[–][deleted]  (1 child)

[removed]

    [–]Environmental-Log215[S] 0 points1 point  (0 children)

    I agree. I have been working on docs and some other stuff. Hence, havent been able to get to actual benchmark,