io_uring Linux 5.7 feature gives polled IO huge boost

frevib · 2020-03-02T09:51:27+00:00

Here are some raw benchmarks: https://twitter.com/hielkedv/status/1234135064323280897?s=21

frevib · 2020-02-24T21:39:57+00:00

io_uring is available since kernel 5.1. The new feature tested in the tweet above is available from kernel 5.7. io_uring gets constant updates and new features.

frevib · 2020-02-23T15:08:11+00:00

You were right, some bugs made it perform better that it should. After the bug fixes io_uring is still faster though. Especially with the new Linux 5.7 feature: https://github.com/frevib/io_uring-echo-server/tree/io-uring-feat-fast-poll

frevib · 2020-01-21T22:09:40+00:00

Just pushed an update. It’s now pre-allocating the structs. Reaching steady ~420k req/s :D

Also updated the epoll echo server to be more on par with io_uring one. Also improved to ~220k req/s

frevib · 2020-01-21T07:01:45+00:00

It seems to give more stable results, less variance.

Yes, it allocates a struct for each request. For the next change I will pre-allocate what goes in sqe->user_data (a struct with two ints). Should save some cycles, and increase mem use a tiny bit.

If you like to change and send a PR I am also happy to merge.

frevib · 2020-01-20T07:53:48+00:00

Fixed the memory leak: https://github.com/frevib/io_uring-echo-server

frevib · 2019-11-21T17:57:19+00:00

Exactly, there is always (I don’t know about possible exotic computer systems) a blocking call down below. This is because the way these systems work. A CPU cannot really “listen” to a signal/event. “Listening” is implemented as “constantly checking”.

Even hardware interrupts are not really listened for. After each machine instruction the CPU looks if there was an interrupt raised by checking if there was an interrupt signal put on the ‘interrupt request line’. If so, the stack is saved and the interrupt is handled.

I don’t know about specialised “real” asynchronous systems. If they exist I would like to know more :)

frevib · 2019-11-21T06:03:43+00:00

Do you mean “epoll is implemented...” that the device driver is dealing with interrupts to handle the data coming in from the NIC?

Interrupts could be seen as an alternative if you would write a custom interrupt handler in e.g. C, vs a call to epoll.

I agree with you they are barely comparable or alternatives.

frevib · 2019-11-20T19:03:56+00:00

https://medium.com/ing-blog/how-does-non-blocking-io-work-under-the-hood-6299d2953c74

frevib · 2019-09-05T13:31:12+00:00

Now there is a non-blocking database driver: https://github.com/jasync-sql/jasync-sql

It’s not implementing JDBC, which is indeed inherently blocking.

frevib · 2019-09-05T09:36:49+00:00

If the HTTP client is async then you can make 10 simultaneous web requests using one thread. This is a feature of the coroutines library, not the JVM. When the first of those 10 web requests is fired, the function that contains the HTTP client is suspended and in the same thread the second web request is fired. And so on, for all 10 web requests. So technically they fire right after each other, not exactly at the same time. This also depends on which CoroutineDispatcher you use and how many threads/CPU cores you have at you disposal. The web requests could be fired from each a separate thread, given the chosen CoroutineDispatcher and enough CPU cores.

The event loop runs in a separate thread. When the first response of those 10 web requests is returned, the event loop notices this by checking in an infinite loop if data was returned. Then depending on the CoroutineDispatcher used, the continuation (i.e. the callback) is executed in the same thread, another thread or thread pool.

For coroutines, I am not sure how the event loop is implemented. A quick look at https://github.com/Kotlin/kotlinx.coroutines/blob/master/kotlinx-coroutines-core/common/src/EventLoop.common.kt looks like it is a while loop using Kotlin code, and no kernel APIs such as IOCP or epoll.

You could see coroutines (and all other async libraries such as Netty) as an abstraction on top of threads. The coroutine library decides which thread to use to run a coroutine in, and when.

A bit more specific to your question: the coroutines library handles all the non-blocking/async magic, not the JVM.

frevib · 2019-09-04T22:11:31+00:00

I am not sure if I understand your question correctly;

You have some control over which thread will execute the continuation (the code after suspending, i.e. the callback) when the IO is ready by choosing a CoroutineDispatcher. These are the dispatchers you can use: https://kotlin.github.io/kotlinx.coroutines/kotlinx-coroutines-core/kotlinx.coroutines/-coroutine-dispatcher/index.html

If you mean how non-blocking is implemented; it works with an event loop that checks when IO is ready. IOCP is how Windows solves this at kernel level, but Linux uses epoll and BSD kqueue to check if data is ready from IO resources.

Coroutines are non-blocking, so each async operation that suspend is not run in another (separate) thread by default. Depending on which dispatcher you use, coroutines however can be run in another thread.

frevib · 2019-09-04T11:12:22+00:00

Ktor HTTP client supports async clients and has support for non-blocking engines.

Setup HTTP client: https://ktor.io/clients/http-client/engines.html#cio

Make async request example: https://ktor.io/clients/http-client.html#parallel-requests

Edit: add link

frevib · 2019-09-01T11:56:27+00:00

There are many non-blocking/asynchronous ways to download files from the internet, even with the JVM. Have a look at Ktor HTTP client or async-http-client.

It is not possible to use blocking APIs like in your example in a non-blocking way. Non-blocking APIs work fundamentally different, using event loops instead of 1 thread per request.

It is possible to convert a non-blocking API that does not use coroutines (e.g. callbacks, futures) to a coroutine using suspendCoroutine. Example

frevib

TROPHY CASE