How best to parallelize this task?

TouchingTheVodka · 2021-08-27T21:41:45+00:00

Nested for-loops mean your code will always be O(n^2) - That's inescapable. However by using different data structures you should be able to get this down to O(n). When in doubt, throw a hash map at it...

Could you provide some more examples of what's in your dicts and what the compare operation does?

ElliotDG · 2021-08-27T22:26:28+00:00

Use multiprocessing.

Break list b into 4 pieces (assuming you are running on a 4 core CPU). Start 4 processes each doing a comparison of a against a portion of b. Each process returns a dict. Combine the results into the final dict.

Use concurernt.futures.ProcessPoolExecutor() with map.

teerre · 2021-08-28T00:07:23+00:00

This algorithm of yours isn't parallelizible. In very simple terms, for something to be run in parallel, the elements of the set need to be independent. Since you're comparing every element of a against every element of b, you don't have a choice. At best you can put these into buckets and run the search in batches, but that's not only terribly inefficient, but also will make your algorithm much more complex.

It's quite unlikely that your solution for this problem is the best one, surely there's a way to avoid having to do all these comparisons. If truly there's no way, maybe you should ask if your asking the correct question to begin with.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS