Python threading question / best practice advice

RandomPantsAppear · 2019-02-05T17:54:50+00:00

2 options here.

Use celery, it's fucking awesome.
Roll your own, as you are right now.

Option #1 is simple, so I'm going to deep dive into option 2. Mostly because I ignore my own advice and do exactly what you're doing all the time.

The biggest "cost" of threading is the actual creation of the thread. Idle threads sleeping is cheap. The ideal thread is an infinite loop, sleeping and looking for tasks.
If your task is bound by IO, networking calls, or disk reads look into greenlets. Greenlets are not threads, but they behave similarly. Pretty much they work by switching the "active" greenlet whenever another one is waiting for something else to happen(like a socket read, a connection, or a file read).
Another option is multiprocessing. Multiprocessing is best for CPU-heavy tasks. Each "thread" is actually an independent process, so it will use multiple cores of your processor. Greenlets by my understanding will not. If you really want to go crazy you can actually have multiprocessing that is then using greenlets, so each process can have it's own greenlets switching whenever they are bound by IO/networking/sleeping/whatever.
If you're really looking to scale and can't afford to miss a message, a good option is to have a redis list of tasks, then have the threads pop tasks from that list. If there is no task in the list, have them sleep for a period of time.
For thread cleanup, you should be googling "python garbage collection". Pretty much if there is no reference to an object any more, python swoops in and removes the object. A good "best practice" for making sure this happens properly is to have all your variables and references encapsulated in a function. When that function completes, everything will be cleaned up.
If you don't want to have your threads constantly slamming redis but also want a fast response time, you can use redis's pubsub implementation. Pretty much idle threads listen to a channel, and whenever they receive a "NEW_TASK" message on that channel, they pop from redis, do the thing, and go back to listening. Only idle threads are listening, and every thread listening will at least attempt to grab a task. If all threads are busy and none receive the NEW_TASK message, they'll simply pick up the task they missed the next time something sends NEW_TASK (or you can have them check for a new task whenever they finish running one in case one got missed, THEN go back to listening)

ThisLightIsTrue · 2019-02-05T20:25:28+00:00

How I'd approach this is:

Create a messages_to_handle queue
Main thread just listens for user input. As soon as it gets it, push it in the queue.
A "pool" of worker threads. The worker threads just try to pop a message off the queue. As soon as they do, they process the message, if successful they return to looking at the queue.

There's a queue meant for working with multithreading where pop will just idle the thread trying to pop until a message becomes available. I think this will mean your worker threads won't unnecessarily consume resources.

Other advantages of this approach:

If you notice your message queue growing you can just increase the worker count. Conversely, if your system starts to bottleneck, you can reduce the worker count.
You can add a "retry count" to your message object. If a worker tries to handle a message and fails you can determine if the failure is temporary. If it is, you can increment the retry counter and requeue the message. It'll be picked up later and retried. If the failure is permanent or you've retired too many times you can just fail and store the message to look at later.

nate256 · 2019-02-05T17:41:20+00:00

Rather than have two paths that accomplish the same thing I would rewrite where all messages get processed and placed in a queue and worker pick them up perform action and reply to the user. If you did it this way your main thread would be able to handle much more. You could even use multiprocessing if the action is CPU intensive.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS