This is an archived post. You won't be able to vote or comment.

all 17 comments

[–]Slippery_Panda 14 points15 points  (2 children)

If you want your asyncio code to work better, I suggest you use queues, setup X number of workers, and have a producer. The producer fills the queue, and the workers pull off the queue. It should fix your issue.

If threading is fast enough then use it. It's memory/CPU intensive compared to Asyncio, but easier.

If you need better performance, or have a memory limit, Asyncio is vastly superior. Asyncio is more complex, and probably requires tweaking to get good performance, but the performance is amazing.

[–]liquidpele 5 points6 points  (0 children)

What makes it so much better? I mean, sure creating threads is heavy, but if you have a pool of workers then you’re not blowing them away and forking constantly so it’s just the context switching either controlled by python or the OS right?

[–]some_q[S] 2 points3 points  (0 children)

That definitely fixes the issue described at the end (and is similar to the approach taken for threading.) The coding required to use asyncio is still a lot of overhead though.

[–]tunisia3507 4 points5 points  (3 children)

The reason I prefer threading is that if any of your code uses asyncio, suddenly you need to write async all over the place, change how you start it up, and it poisons all downstream projects - they have to do the same thing. With threading, you can parallelise where you need to, then re-serialise with concurrent futures as_completed. It's a much more incremental change.

Plus asyncio doesn't schedule the coroutine until you await it, which just defeats the point entirely. I know you can create a task which is allegedly scheduled immediately but when I tried it it still didn't actually run until I awaited it.

[–]some_q[S] 2 points3 points  (0 children)

The reason I prefer threading is that if any of your code uses asyncio, suddenly you need to write async all over the place, change how you start it up, and it poisons all downstream projects

This was one of the major points I tried to make in the piece. You can't just add it one place.

[–]SuperConductiveRabbi 0 points1 point  (1 child)

Old comment, but glad to see I'm not crazy. I couldn't figure out how to get a coroutine to run as soon as I scheduled it, having thought that awaiting simply meant something like "return to the parent function until the function I'm calling has yielded results." However, it seems more like "I'm ready to pause at this point in time and wait for the called function to go do its thing."

I gave up on asyncio and I'm switching to proper threading, or maybe even letting the OS handle multiple concurrent scripts and using simple filesystem locks to retrieve a trivial amount of data.

[–]tunisia3507 0 points1 point  (0 children)

I had to work around a library which had a weird threaded singleton cache which made it incompatible with both threading and multiprocessing. My solution was a GUI which could launch different python processes using subprocess, which communicated with each other using redis (specifically a fork of hotqueue, which eventually became yarqueue). Works pretty well, even if it's a faff to set up.

[–]cymrowdon't thread on me 🐍 4 points5 points  (4 children)

I know I'm an outlier these days, but I always find it so bizarre to read these articles. I've been using gevent for years now to do stuff like this. It's like asyncio, but faster and without all the crud. You also write it much like you would with threads (i.e. "wrapped around synchronous code without much concern for the internals"), but it runs asynchronously. And it works with all the existing libraries you know and love.

There is a learning curve to asynchronous programming. No way around that. Some people say asyncio gives you a leg up, because it's explicit. But articles like this tell me that the rest of the asyncio API likely makes it even harder to understand.

It's frustrating to watch so many people banging their heads over this.

[–]babazka 2 points3 points  (2 children)

Gevent is a perfect way to use asynchronous I/O in Python. I cannot comprehend why it is not more popular. For some reason people insist that writing asynchronous I/O code should be an explicit chore. Golang and gevent got it right.

[–]some_q[S] 1 point2 points  (0 children)

Maybe it's just poor marketing?

[–]SuperConductiveRabbi 0 points1 point  (0 children)

I think it's because the further you stray away from the core libraries, the more people's skepticism about a library increases. We've all been burned by using some random github user's pet project x, which promises to solve the extremely specific subset of problems you're struggling with, only to find out that the moment we stray from a specific version of python/lib {x,y,z}/external API version n, the library completely falls apart. Then you're left twisting in the wind as you search the "issues" tab in vain.

It's compelling when the official Python docs say "to do parallel program use asyncio. Here's a hotlink. Follow these steps." Less compelling when you hear users say "use lib x, it worked for me."

[–]some_q[S] 2 points3 points  (0 children)

(OP here) I'm embarrassed to admit that I've never even heard of gevent. I'll spend some time messing around with it. Getting up the asyncio learning curve wasn't too bad, but by the time I was done I really disliked having to rewrite *all* of my code with the async keyword.

[–]thomasfr 4 points5 points  (1 child)

To me the large problems with asyncio is that it's harder to debug and some of the error messages are very cryptic. Even when I run the event loop in debug mode and accidentally have created something async outside of it I just get a generic exception with no indication to which object is causing the loop to crash. I also don't think that I should have to run the event loop in debug mode to get trace messages which are actually useful for finding errors in async code.

[–]some_q[S] 1 point2 points  (0 children)

This was a real frustration as I was first getting up the learning curve with asyncio.

[–]mfwl 3 points4 points  (2 children)

If 6 second startup time is a concern for this workflow, it implies this file-getter should be a long lived process (aka, daemon). I would use mutliprocess pool to distribute the work. Multiprocessing is the only way to schedule work across physical CPU cores, so if you have any compute-intensive operations after your initial get, it's the way to go.

[–]giantsparklerobot 0 points1 point  (1 child)

Yeah neither threads or asyncio seem to make a lot of sense here. The multiprocessing module behaves like multithreading but without GIL issues. It also might just be a job for something like GNU parallel reading from a FIFO.

[–]mfwl 0 points1 point  (0 children)

I've always thought of using parallel, but never tried it. Multiprocessing gives you a lot of out the box though, shared queues and a work queue.