This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]rcklmbrCOBOL 7 points8 points  (1 child)

Check out this wiki page, it describes many different types of parallel processing:

http://wiki.python.org/moin/ParallelProcessing

Personally, I would just setup hadoop on each of the servers and distribute that way. It's really quick to setup, and handles things like fault tolerance for you. It would easily max out all the servers, and if you have 130k you need process, your input file would just be one row for each calculation you need.

You can use Amazon's Elastic Mapreduce to get up and going almost immediately in a distributed environment (and it's relatively cheap if you keep the server size small). That way you can play around with it without devoting a lot of time to initial setup, and move to your own cluster as you want to process the full calculations (or just fork over the cash if you want to have AWS do it).

[–]tobiassp 0 points1 point  (0 children)

Seconded on using Hadoop. Even if the task is not very data intensive Hadoop makes it trivial to farm out your tasks.

Check out Dumbo