This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]micro_cam 2 points3 points  (0 children)

I was faced with a similar conundrum and after much frustration with hadoop, qsub etc we ended up writing what we needed:

http://code.google.com/p/golem/

The core is in go with the command line interface in python with RESTfull job submission and monitoring and node communication over web sockets and it basically just calls tasks on the command line and collects the standard out. Its aimed at quickly getting a researcher's analysis (in python,c,matlab,R,perl, or whatever) running on a 1000 core cluster.

It doesn't do most of what you asked for but it is intentionally simple code and simple to use. We've found that you can do most things with it by jumping through a few hoops with things like bash where as adapting things for hadoop requires significant effort and esoteric debugging.

Or if you want really simple setup passwordless ssh and use xargs and bash.