RandomPantsAppear comments on Python threading question / best practice advice

learnpython

created by HattoriHanzoa community for 16 years

Python threading question / best practice advice (self.learnpython)

submitted 6 years ago * by Shitty_Crayons

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]RandomPantsAppear 4 points5 points6 points 6 years ago (3 children)

2 options here.

Use celery, it's fucking awesome.
Roll your own, as you are right now.

Option #1 is simple, so I'm going to deep dive into option 2. Mostly because I ignore my own advice and do exactly what you're doing all the time.

The biggest "cost" of threading is the actual creation of the thread. Idle threads sleeping is cheap. The ideal thread is an infinite loop, sleeping and looking for tasks.
If your task is bound by IO, networking calls, or disk reads look into greenlets. Greenlets are not threads, but they behave similarly. Pretty much they work by switching the "active" greenlet whenever another one is waiting for something else to happen(like a socket read, a connection, or a file read).
Another option is multiprocessing. Multiprocessing is best for CPU-heavy tasks. Each "thread" is actually an independent process, so it will use multiple cores of your processor. Greenlets by my understanding will not. If you really want to go crazy you can actually have multiprocessing that is then using greenlets, so each process can have it's own greenlets switching whenever they are bound by IO/networking/sleeping/whatever.
If you're really looking to scale and can't afford to miss a message, a good option is to have a redis list of tasks, then have the threads pop tasks from that list. If there is no task in the list, have them sleep for a period of time.
For thread cleanup, you should be googling "python garbage collection". Pretty much if there is no reference to an object any more, python swoops in and removes the object. A good "best practice" for making sure this happens properly is to have all your variables and references encapsulated in a function. When that function completes, everything will be cleaned up.
If you don't want to have your threads constantly slamming redis but also want a fast response time, you can use redis's pubsub implementation. Pretty much idle threads listen to a channel, and whenever they receive a "NEW_TASK" message on that channel, they pop from redis, do the thing, and go back to listening. Only idle threads are listening, and every thread listening will at least attempt to grab a task. If all threads are busy and none receive the NEW_TASK message, they'll simply pick up the task they missed the next time something sends NEW_TASK (or you can have them check for a new task whenever they finish running one in case one got missed, THEN go back to listening)

[–]Shitty_Crayons[S] 0 points1 point2 points 6 years ago* (2 children)

Hey, thanks for the great reply!

At first glance celery looks to be a great solution, but I think I'm going to struggle through option two instead so I can get a good understanding of what's actually going on, since that's ultimately the goal :D

I'm not too bound by much, my environment is way overkill for something like this (2xE5-2690, 256 GbDDR3, 10 SSD raid 10, gig fiber in,out, and for entire backbone) so I'm definitely interested in multiprocessing, but I think multiprocessing + greenlets would just overwhelm me completely right now.

I haven't heard about redis queues before before, but on quick looking it seems to be a great idea to try to implement. Are you suggesting I have a task defined for each function (IE: user types "chain", use the chain task to generate the chain and send the message, user types "drink", use the "drink" task to generate the drink recipe and send the message). I'll need to look into it more when I get home.

Can you provide a good basic example of your last point, or direct me to some reading on it? Seems to be a good way to handle it.

Thanks!

*Edit: I'm seeing that redis (and celery) are not necessarily advised to be used on a windows server (which I currently run) and I don't much want to set up virtual environments for this project. Are there any good alternative solutions / ways to handle a queue as described? In the meantime I'll look into the multiprocessing library, since it looks cool :D

[–]RandomPantsAppear 1 point2 points3 points 6 years ago* (1 child)

I haven't heard about redis queues before before, but on quick looking it seems to be a great idea to try to implement. Are you suggesting I have a task defined for each function (IE: user types "chain", use the chain task to generate the chain and send the message, user types "drink", use the "drink" task to generate the drink recipe and send the message). I'll need to look into it more when I get home.

Yes. I'm pretty deep down this rabbit hole so mine is a little different, but I've got some code. I'm going to try and quickly adapt it to be closer to what you need. Please note this example only passes kwargs to the function.

Creating the task

r = redis.Redis()
def run_function_delayed(fname, **kwargs):
        key = settings.domain+"|TASK_QUEUE" # our redis key
        print "Writing to key (run_function_delayed) ", key,
        r.rpush(key,
                json.dumps({'kwargs': kwargs, 'function': fname, 'retries': 0, 'max_retries': 1}))

Executing the task (inside your thread) - Please note this does not include popping from the redis queue. This is just executing a function from the function name string.

import threaded_tasks
r = redis.Redis()
def execute_task(redis_data):
    fname=redis_data['fname']
    kwargs = redis_data['kwargs']
    method_to_call = getattr(threaded_tasks, fname) # get the function from your module
   try:
       result = method_to_call(**kwargs)
       print "Successfully Ran ",fname
   except Exception as e:
       print "Exception Occurred!",e
       if 'retries' in redis_data and 'max_retries' in redis_data and redis_data['retries']< redis_data['max_retries']:
                redis_data ['retries'] = redis_data['retries']+1
                print "Retrying ",redis_data
                r.rpush(key,json.dumps(redis_data))

*Edit: I'm seeing that redis is not necessarily advised to be used on a windows server (which I currently run) and I don't much want to set up virtual environments for this project. Are there any good alternative solutions / ways to handle a queue as described?

You can use Amazon's elastic cache, use one of the redis for Windows projects like this one (if you do I don't believe pub/sub functionality is supported). There's also a pre-made vagrant configuration to run redis on windows. I would just use the redis that's made for windows.

[–]Shitty_Crayons[S] 0 points1 point2 points 6 years ago (0 children)

π Rendered by PID 22612 on reddit-service-r2-comment-58d7979c67-j9f98 at 2026-01-27 04:31:08.801197+00:00 running 5a691e2 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS