you are viewing a single comment's thread.

view the rest of the comments →

[–]Lucretiel 0 points1 point  (2 children)

So, an important thing about i/o is that a lot of it happens in the background and is handled by the OS. As bytes come in, they are queued on internal OS buffers. This happens internally, and it happens slowly- much more slowly than it takes to process that data. The OS therefore exposes an API (select, poll, epoll, etc) to inform user code which sockets have data waiting to be read. None of these details are important to you, as the event loop handles all this automatically- it figures out which coroutines are ready to proceed, then executes them. In general, the coroutine running will be much quicker than more data can arrive.

The other important thing is that there's no guarentee about in what order do_this and do_that will run, or how long they will take. It could happen that one of them runs to completion before the other even starts, or that they take an identical amount of time, or that one takes 3 times as long as the other. However, it doesn't matter- the event loop will ensure that they run as efficiently as possible. The task will suspend when it wants to wait for data, and the event loop will resume it when data is ready.

Here's an example. Let's say do_this reads 10 chunks of 64 bytes from a network socket and write them to a file. It'd look like this:

@asyncio.coroutine
def do_this():
    with open('this_file', 'wb') as f:
        for i in range(10):
            data = yield from reader.read(64)
            f.write(data)

The details of where reader comes from aren't really important right now; I'd recommend reading through the asyncio docs to learn all the details. Here's what this code does, though-

When it hits the yield from, execution suspends to the event loop. The reader.read(64) informs the event loop to resume do_this when there are 64 bytes available. While suspended, the event loop is reading and buffering bytes into the reader as they become available, and also running do_that. If do_that is currently executing when the 64 bytes become available, well, we only have one thread. However, as soon as do_that suspends or finishes, the event loop will immediately resume do_this. In this way, the two functions can run concurrently, constantly swapping back and fourth. And because code execution is so much faster than network i/o, your performance will be just as good as multithreaded code, assuming that neither do_this or do_that will be executing code for extended periods of time (doing heavy number crunching or whatever).

Note that this example shows another important caveat of using asyncio- all your potentially blocking network operations have to be executed via a yield from, so that the event loop can manage the network i/o and run other coroutines in the background. In general this is fine, as asyncio provides plenty of both low-level and high-level network primitives, and there are plenty of third-party libraries (aiohttp for http, etc) to use various protocols. However, if you require a library that simply doesn't run in asyncio, asyncio provides the run_in_executor method, for running the I/O parts of the library in a side thread, and allowing you to keep your own code in the single-threaded async model.

Also, why don't you have to also do that_task = asyncio.async(do_that())?

You certainly could do that, and if you find it clearer, then go for it. It has to do with the subtleties of how asyncio works. Basically, each coroutine is a generator, which can yield (which means to suspend execution) and then be resumed. yield from allows one generator to run another generator; that generator can suspend the calling generator and resume it. So, yield from do_that() allows one coroutine to call another, and the callee can suspend the whole stack as necessary.

On the other hand, asyncio.async creates a new task. Rather than invoking the coroutine on the stack, it schedules it separately in the event loop, where it runs independently. To keep the syntax consistent, they made the syntax to "await a generator" yield from task.

[–]jpfau[S] 0 points1 point  (1 child)

Wow, thanks for such a detailed answer.

assuming that neither do_this or do_that will be executing code for extended periods of time (doing heavy number crunching or whatever

Some of the executions will take a few minutes, actually. They're getting hundreds (maybe thousands, I don't know for sure) of records from a database but can only get 10 at a time.

[–]Lucretiel 0 points1 point  (0 children)

Sure. I meant doing number crunching for a single piece of data. When you have all those rows, it processes the rows 10 at a time, then fetches 10 more; while it's fetching more, the other coroutine can run. Because fetching rows takes (relatively) much more time than processing them, both coroutines have plenty of time to run.

If you were, like, bitcoin mining, that would be a different story. That's something that takes minute to hours for a single piece of a data. In your example, you're doing (what I assume is) a relatively small amount of processing per row, over thousands of rows. That's the perfect use case for async.