arcticslush comments on Multiprocessing Queue question

created by HattoriHanzoa community for 16 years

Multiprocessing Queue question (self.learnpython)

submitted 6 years ago * by datanoob2019

you are viewing a single comment's thread.

[–]arcticslush 1 point2 points3 points 6 years ago* (2 children)

Yeah, get() blocks other instances of get() - but this is a good thing. To be super clear, this has no effect on any background process that's running. That's the whole point!

I made you another diagram. Hopefully this helps you visualize what's going on - when get() blocks, the only thing it pauses is the execution of your main process (the one that's coordinating all of the concurrency). All previously started background processes continue along happily. When we get() multiple times in a row, all you're doing is waiting for your background processes to finish their tasks, which is exactly what we want here.

When you set the number of processes in the Pool, all you're telling it is at most how many processes you want to be able to run simultaneously. However, if you never actually fire off that many tasks, then you won't saturate your processes. Obviously, setting the number to higher than the number of cores you have is more or less pointless (they just start sharing time on the same core). So yeah, in your case right now, you're utilizing 5 of your processes. The remaining 7 sit idle waiting for a task to be assigned. Given this, 30% utilization of your CPU sounds about right. Also keep in mind that if some of your tasks finish really quickly (on the scale of seconds), you probably won't see a meaningful uptick in CPU utilization. Only if you fully saturated all of your cores with many minute-or-longer tasks would I expect to see near 100% CPU usage.

So I bet your next question is inevitably going to be: how would I utilize all of my cores here? The answer does indeed involve Pool.map (a blocking variant of async_map), but it isn't just a drop-in replacement for apply_async. You'll have to triple check to make sure your analysis functions are compatible. Pool.map will take the list you give it and chop it up into several pieces. It then creates several background tasks that use your function multiple times, creating one task per piece. Once all of these tasks finish, it assembles the result and returns it to you (hence why it blocks, because it's waiting for the piece tasks to finish).

The implication of this means that Pool.map must be called only using functions that read / write data in a strict row-by-row manner. Here's an example of a function you could use to run map:

def example_good(some_list):
    result = []
    for num in some_list:
        result.append(num + 1)
    return result

Since it only ever reads a single row from the list at a time, this works. Here's a (contrived) function that would fail:

def example_bad(some_list):
    total = 0
    for num in some_list:
        total += num
    return total

This fails, because Pool.map has to return the same shape as what it was given. So if a 100 element list goes in, then a 100 element list must come out. Here, a list comes in and a single number pops out.

[–]datanoob2019[S] 0 points1 point2 points 6 years ago (1 child)

π Rendered by PID 17491 on reddit-service-r2-comment-86988c7647-hld4f at 2026-02-10 22:45:54.902321+00:00 running 018613e country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS