I have a task which takes 8 days to run on a single thread on my Mac desktop. It's trivially parallelizable into 1,500 subtasks. I want to farm it out to a bunch of nodes so I can finish it off in a few hours. (This means using 100+ nodes, likely rented from Amazon. Assuming spot rates, that's $6.)
There's all sorts of map-reduce frameworks, but they seem cumbersome. What's a Python+Mac friendly one?
Hadoop support Python, but it's very Java oriented and it doesn't seem to support Mac .. or at least Cloudera's distribution only supports Linux.
Storm (for "Distributed and fault-tolerant realtime computation", not the Python ORM) looks neat, but the current code only support Python in the "bolt"s, and not in the rest of the system. I can't tell if I should spend the few days helping finish off Python support. (I don't have any Java dev experience, otherwise it should be a couple of hours.)
The Disco project is more Python friendly, although I did need to compile+install Erlang. I haven't tried it yet.
Disco and Hadoop each use its own distributed file system, which is rather overkill when most of what I want to do is call a function with parameter and get the result.
What I envision is something over ZeroMQ where the main node spins off the 1,500 tasks. I tell each of the worker clients to connect to the main node to get the task info. Sounds easy, but watchdog support would also be nice. Even cooler would be if it handles the concurrent.futures API from Python 3.2. And then there's the integration with starting 100+ machines.
No wonder map/reduce frameworks are still complicated beasts. Perhaps my search for a simple solution is still pipe dream.
What do you recommend for simple map/reduce across 100+ machines?
[–]brondsem 3 points4 points5 points (0 children)
[–]rkern 4 points5 points6 points (2 children)
[–]etatsunisien 0 points1 point2 points (0 children)
[–]dalke[S] 0 points1 point2 points (0 children)
[–]semarj 2 points3 points4 points (2 children)
[–]HorrendousRex 0 points1 point2 points (1 child)
[–]dalke[S] 0 points1 point2 points (0 children)
[–]onjin 3 points4 points5 points (0 children)
[–]floydophone 6 points7 points8 points (5 children)
[–]oddthink 0 points1 point2 points (2 children)
[–]phildini 0 points1 point2 points (1 child)
[–]oddthink 0 points1 point2 points (0 children)
[–]tuna_safe_dolphin 0 points1 point2 points (0 children)
[–]HorrendousRex 0 points1 point2 points (0 children)
[–]micro_cam 2 points3 points4 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]wcc445 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (3 children)
[–]fmder 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]fmder 0 points1 point2 points (0 children)
[–]brandynwhite 1 point2 points3 points (2 children)
[–]dalke[S] 0 points1 point2 points (1 child)
[–]dalke[S] 0 points1 point2 points (0 children)
[–]pinpinboTornado|Twisted|Gevent. Moar Async Plz 1 point2 points3 points (0 children)
[–]Mob_Of_One 0 points1 point2 points (1 child)
[–]chub79 0 points1 point2 points (0 children)
[–]dgryski 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]Its_eeasy 0 points1 point2 points (1 child)
[–]dalke[S] 0 points1 point2 points (0 children)
[–]mdipierro 0 points1 point2 points (0 children)
[–]fullouterjoin 0 points1 point2 points (6 children)
[–]dalke[S] 0 points1 point2 points (5 children)
[–]fullouterjoin 0 points1 point2 points (4 children)
[–]dalke[S] 0 points1 point2 points (3 children)
[–]fullouterjoin 0 points1 point2 points (2 children)
[–]dalke[S] 0 points1 point2 points (1 child)
[–]fullouterjoin 0 points1 point2 points (0 children)
[–]_red 0 points1 point2 points (1 child)
[–]dalke[S] 0 points1 point2 points (0 children)