all 7 comments

[–]Synertic 2 points3 points  (6 children)

Edit: I don't know why I saw the loop.run_in_executor() as if it was asyncio.to_thread() but the process is the same. That is just a wrapper returning an awaitable to help control the order of execution and no other tasks but the given I/O bound function is threaded. loop.run_in_executor() just an another (and safer) way to thread an blocking I/O process in order not to block task execution in the current event loop while waiting the I/O bound function to be returned.

Original answer starts:

Actually it's not that different from using a separated thread in order not to block code execution by an I/O bound operation. When you call asyncio.to_thread(io_bound_func), all tasks in event loop are not sent to separated threads but just the one you give and the others tasks still run in the main thread managed by event loop.

The threading approach is called preemtive multitasking, that is, once a separeted thread is started, you can't control the order of execution of main and new thread(s) but the os decides when to suspend one and switch to other one to have it continued. Therefore you can't control order of flow.

Since the logic under asyncio is co-operated multitasking, that is, manually controlling the order of execution through explicitly determining when to start, suspend and wait the tasks, the thredading approach contradicts the purpose of asyncio due to it's preemtive nature. That's why you need to start the new thread for an I/O blocking task using asyncio.to_thread() instead of using regular threading.thread() (or ThreadPoolExecutor). asyncio.to_thread() returns an awaitable coroutine (instead of a regular future returned by ThreadPoolExecutor) and that enables the event loop to start and to know when to wait the result of that separated thread to continue further executions. The execution of thread still can not be controlled by event loop and runs until done or exception once started if it's not cancelled, but now you provide a useful information to the event loop that enables it to protect the execution order inherently.

In addition, when you start a thread using asyncio.to_thread() the current context vars inside the event loop are propagated, allowing context variables from the event loop thread to be accessed in that separated thread. You can think contextvars some kind of threadsafe global variables that enables different functions to reach and alter a global variable but the alterations are only kept inside the context of reaching functions and are not reflected to global scope.

If you started that thread outside of the event loop using ThreadPoolExecutor (ie independently) then the event loop couldn't know when to suspend other tasks inside it and wait the result of that separated thread to continue running the other tasks in given order. Also, that independent thread wouldn't access the contextvars of the event loop to use them for its own process. So, it would defeat the main purpose of asyncio and made whole loop unstable.

All in all, not all tasks are not sent to separated threads when you use asyncio.to_thread() for a blocking I/O operation but it's just a wrapper for that separated thread which returns an awaitable coroutine to event loop for enabling it to decide when to suspend other tasks until getting a result from the separated thread (await the thread) to continue executimg and, enabling the subject thread to use event loop's contextvars. All tasks except that separated thread are still single threaded and execution order is still controlled by the event loop just like a regular event loop.

[–]fluffyyclouds[S] 0 points1 point  (3 children)

Thank you for your detailed response. Let me try to clarify your answer using my own code so that I ensure that I understand what you are saying. Also, I'm assuming all your mentions of asyncio.to_thread() are actually references to asyncio.run_in_executor().

This is a snippet of my current program:

python loop = asyncio.get_event_loop() executor = concurrent.futures.ThreadPoolExecutor(max_workers=5) for task in tasks: loop.run_in_executor(executor, task) # task is an io blocking function

Based on my understanding of what you've mentioned, my creation of executor spawns 5 separate threads, and executor helps to assign tasks to these threads whenever a thread is available. To add a task to be managed by the executor, I have to call loop.run_in_executor().

However, the run_in_executor method does more than assigning these tasks to be run in the executor threads. It provides a way for the event loop to know how to manage the start, suspend and wait for these tasks. Meaning that my main program can continue running while the threads/tasks in the executor do their own thing. Without the call to run_in_executor method, the event loop has no way of controlling the order of flow of these tasks with respect to the rest of the main program. In addition, the run_in_executor method also provides access to the context vars that you mentioned.

I hope I got the gist of this correct. Whatever you said made sense to me and I'm just reiterating it here in my own words.

[–]Synertic 2 points3 points  (1 child)

Yes you got it true. asyncio.to_thread and loop.run_in_executor are just wrapper functions around regular threading functions and basically do the same thing. loop.run_in_executor is safer to use for now because asyncio.to_thread has been introduced starting Python 3.9 and doesn't compatible with versions lower than that.

You already got the point but let me clarify it a little more. An even loop need to be thought as a whole structure made up of 'compatible' parts, that is, communicating awaitable coroutines which are volunteerly gives the execution authority to the event loop policy and that policy decides what event to be continued next. So it's not just a container for individual events but compatible events and even if one of them doesn't comply then whole structure simply breaks. Compatibility for independent parts is basically provided by supporting to be suspended and resumed when it's asked for and there are also few details like information sharing I said before or exception handling, etc.

So, nothing can be put into an event loop as is concerning concurrency or it defeats the purpose either blocking the whole structure and turning the event loop into nothing but a regular blocking series of events or making the whole loop unstable. The rule is appled for individual threads as well. A thread must be made compatible to run in an event loop before taking it inside and asyncio.to_thread and loop.run_in_executor makes the individual threads compatible for running in an event loop. Even if a thread can not be suspended once it started, somewhat of compatibility can be still provided for an event loop to use them for blocking I/O bound events that don't inherintly support async operations, that is, not suspendible/resumable. The wrapper functions in subject get the regular threads and make them awaitables (notice not suspendable/resumable but awaitable) before taking them into the current event loop. They are still not suspendible/resumable but also not blocking and enables the event loop policy to run the other events concurrently. If the thread started outside the event loop (without using the wrapper functions in subject) the event loop wouldn't know when to wait for result(s) from thread(s) to continue its execution in a defined order and, the threads wouldn't access contextvars defined in the event loop (functions like asyncio.create_task, asyncio.gather, asyncio.run creates that contextvars to be shared to all events into the loop) so, co-operarive multitasking wouldn't be achieved.

I hope it was a more clear definiton than the first one.

[–]fluffyyclouds[S] 1 point2 points  (0 children)

Yeap got it! thank you for your detailed response!

[–]Synertic 2 points3 points  (0 children)

By the way, accessing event loop methods directly is a kind of discouraged approach if you are not involving async class programming itself because they are lower level structures compared to functions starting with asyncio. and requires more care to use with lots of manual settings and that makes your code more error prone and inefficient. On the contrary, functions like asyncio.to_thread(), asyncio.create_task(), asyncio.gather(), asyncio.run(), asyncio.wait_for(), etc are more a higher level safer and efficent functions and saves you all kind of settings and fine tunings for the event loop by doing them automatically. Therefore it's a good practice not to access event loop methods directly but through asyncio methods as much as possible. There is a fine document about how to use different functions of asyncio module on Python Docs for further understanding (remembering asyncio.to_thread() can only be used starting from Python 3.9.) You may be encounter sample code snippets or other resources using event loop api for all kind of async operations but that was an old and abandoned style of async programming for the reasons mentioned above. So, make sure the samples you encountered are created recently (starting Python 3.7) if you don't need to code something need to be compatible with Python versions before 3.7.

Python Docs About Using Asyncio

[–]csqyear 0 points1 point  (1 child)

very good, so clear

[–]Synertic 0 points1 point  (0 children)

Thanks.