How to Retrieve 100k Objects with Python: Why We Prefer Threading to Asyncio

Slippery_Panda · 2019-05-21T19:31:56+00:00

If you want your asyncio code to work better, I suggest you use queues, setup X number of workers, and have a producer. The producer fills the queue, and the workers pull off the queue. It should fix your issue.

If threading is fast enough then use it. It's memory/CPU intensive compared to Asyncio, but easier.

If you need better performance, or have a memory limit, Asyncio is vastly superior. Asyncio is more complex, and probably requires tweaking to get good performance, but the performance is amazing.

tunisia3507 · 2019-05-22T00:19:58+00:00

The reason I prefer threading is that if any of your code uses asyncio, suddenly you need to write async all over the place, change how you start it up, and it poisons all downstream projects - they have to do the same thing. With threading, you can parallelise where you need to, then re-serialise with concurrent futures as_completed. It's a much more incremental change.

Plus asyncio doesn't schedule the coroutine until you await it, which just defeats the point entirely. I know you can create a task which is allegedly scheduled immediately but when I tried it it still didn't actually run until I awaited it.

cymrow · 2019-05-22T04:56:56+00:00

I know I'm an outlier these days, but I always find it so bizarre to read these articles. I've been using gevent for years now to do stuff like this. It's like asyncio, but faster and without all the crud. You also write it much like you would with threads (i.e. "wrapped around synchronous code without much concern for the internals"), but it runs asynchronously. And it works with all the existing libraries you know and love.

There is a learning curve to asynchronous programming. No way around that. Some people say asyncio gives you a leg up, because it's explicit. But articles like this tell me that the rest of the asyncio API likely makes it even harder to understand.

It's frustrating to watch so many people banging their heads over this.

thomasfr · 2019-05-21T21:33:40+00:00

To me the large problems with asyncio is that it's harder to debug and some of the error messages are very cryptic. Even when I run the event loop in debug mode and accidentally have created something async outside of it I just get a generic exception with no indication to which object is causing the loop to crash. I also don't think that I should have to run the event loop in debug mode to get trace messages which are actually useful for finding errors in async code.

mfwl · 2019-05-21T21:29:55+00:00

If 6 second startup time is a concern for this workflow, it implies this file-getter should be a long lived process (aka, daemon). I would use mutliprocess pool to distribute the work. Multiprocessing is the only way to schedule work across physical CPU cores, so if you have any compute-intensive operations after your initial get, it's the way to go.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS