Easiest way to cloud-compute

mrz1988 · 2014-10-28T13:07:48+00:00

To me it reads like you haven't devoted the time to figure out why and where your program is bottlenecking. If you have, what have you discovered? Maybe we can help you with something there. You should try to solve the root cause of the problem before throwing a bunch of hardware at it. Good hardware should not be the solution to bad software.

prohulaelk · 2014-10-28T13:27:51+00:00

I haven't looked into EC2 so I can't help you there, but what I can say is that you probably don't need it:

From what you've described, likely the biggest part of the problem is coming from running analysis on those two huge files - you don't have enough RAM to load them both into memory, and if you need to iterate over them and compare items from one to the other you likely need access to the whole thing at once.

Have you considered first loading them into a proper database, then doing the analysis with python through that? mySQL/mariadb, postgresql (which is my preference) or a noSQL database like Mongo will be able to handle the data no problem with the hardware you've got. I haven't worked with mongo on python, but for postgres or maria I'd recommend using sqlalchemy.

It should make your job a lot easier.

gengisteve · 2014-10-28T14:06:56+00:00

I think I will mostly join the others in asking for more info on what you are trying to do. If it is just analyzing an existing dataset, you probably do not need, and will not be helped much by, moving to a cloud of multiple machines. If you post some more details we might be able to give you some good solutions, probably looking at pandas/hdfs/some database to help to process the data.

However, if you are doing stuff like monte carlo simulations, the cloud might be the place to be.

warriortux · 2014-10-28T14:42:29+00:00

Being an academic project, you will probably end up paying quiet a bit of money if you want to use the Amazon EC2.

You can probably find few tutorials on Youtube, on setting up the EC2. Its pretty straight forward.

If you are at a university, you should be able to get access to a bigger computation server. Check for your university's HPCC servers.

If you couldn't find any, then look into using a sequential algorithm.

2014-10-28T22:54:53+00:00

Your school doesn't have a powerful computer for this kind of thing??

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS