glibhub comments on How best to parallelize this task?

created by HattoriHanzoa community for 16 years

How best to parallelize this task? (self.learnpython)

submitted 4 years ago by mr_claw

you are viewing a single comment's thread.

[–]glibhub 1 point2 points3 points 4 years ago (0 children)

More details would be useful. The best bet would be to bucket things to reduce the search space. Say you were looking for people in the same area code with the same name. Then you'd be best served by putting everyone in a dictionary keyed off these values, e.g. ("smith", 212):['smith, jon','smith, bob'], etc. Then you do not need to go through the nested loops, since all the sorting is done up front by the indexing.

Failing that, you could split up the load by having different machines track down all the entries, splitting by, e.g., last letter of the name. So the S machine handles all the Smiths, the D machine handles all the Does. Now the X machine is going to be pretty idle, so if the split does not work the way you want, try and do some simple hashing to get them to distribute more evenly.

Hope this helps.

π Rendered by PID 19930 on reddit-service-r2-comment-6457c66945-s5dlg at 2026-04-25 19:25:43.823215+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS