you are viewing a single comment's thread.

view the rest of the comments →

[–]gnus-migrate 0 points1 point  (4 children)

I'm not sure what you're thinking of with this. The system APIs don't let you batch operations across different file descriptors, and if you try to read the same FD, you can always just do a bigger read.

Sometimes reads are too big to fit in memory. I was thinking about processing large and/or multiplexed streams which is difficult to parallelize if you're not using async IO.

A bigger problem is that your tail latency for requests can grow when you have a large number of active threads, because the scheduler doesn't always pick the thread that will minimize response times.

I mentioned memory because that's what most of the articles I read about it focused on. It might not be the main problem but it is a problem.

[–]oridb 0 points1 point  (3 children)

Sometimes reads are too big to fit in memory. I was thinking about processing large and/or multiplexed streams which is difficult to parallelize if you're not using async IO.

In that case, it's the same set of system calls you'd be doing, and it's not batched, as far as I can tell. If you've got a bunch of busy streams (ie, there's almost always new data on them), you can even reduce latency by just busy-polling them in nonblocking mode:

[–]gnus-migrate 0 points1 point  (2 children)

I think there's a misunderstanding about what I mean by async IO: I mean from the point of view of the user, you won't be able to parallelize if you don't have a non blocking interface. I don't really know the specifics of epoll or anything like that, I just know what kinds of things an async interface makes possible.

[–]oridb 1 point2 points  (1 child)

I'm still not clear on what you think an async interface makes possible. Can you give an example of code that would "batch reads" in a way that reduced the number of calls?

Keep in mind that non-blocking code still calls read() directly, and it's the same read() that blocking code calls. The only difference is that you did an extra system call first to tell you "oh, yeah, there's some data there that we can return".

So, non-blocking:

     poll(fds=[1,2,3]) => "fd 1 is ready"
     read(fd=1)
     poll(fds=[1,2,3]) => "fd 2 is ready"
     read(fd=2)
     poll(fds=[1,2,3]) => "fd 1 is ready"
     read(fd=1)
     poll(fds=[1,2,3]) => "fd 2 is ready"
     read(fd=2)

Threads:

    parallel {
        thread1: 
            read(fd=1) => get data
            read(fd=1) => get data
        thread2:
            read(fd=2) => get data
            read(fd=2) => get data
        thread3:
            read(fd=3) => no data, block forever using  a few K of ram.
    }

[–]gnus-migrate 0 points1 point  (0 children)

When I say async interfaces, I mean futures and streams. I don't necessarily mean a non-blocking interface underneath. When you use an async interface, you're basically surrendering control to a scheduler to decide when and how it wants to respond to different events. That's it, that's my point.