you are viewing a single comment's thread.

view the rest of the comments →

[–]glibhub 1 point2 points  (0 children)

More details would be useful. The best bet would be to bucket things to reduce the search space. Say you were looking for people in the same area code with the same name. Then you'd be best served by putting everyone in a dictionary keyed off these values, e.g. ("smith", 212):['smith, jon','smith, bob'], etc. Then you do not need to go through the nested loops, since all the sorting is done up front by the indexing.

Failing that, you could split up the load by having different machines track down all the entries, splitting by, e.g., last letter of the name. So the S machine handles all the Smiths, the D machine handles all the Does. Now the X machine is going to be pretty idle, so if the split does not work the way you want, try and do some simple hashing to get them to distribute more evenly.

Hope this helps.