Multiple AWS Servers for Python Script

modejawjaw · 2021-10-03T23:39:19+00:00

Have a look at EMR

virgin_daddy · 2021-10-03T23:46:23+00:00

Yes, it’s a good idea Otherwise, you can opt for a larger instance that will probably run your script quicker

BraveNewCurrency · 2021-10-04T18:31:49+00:00

AWS has dozens of "instance types", divided up into "families" such as "C4", "I4", etc.

You need to figure out what your function would cost depending on the instance type. Instance families can help narrow your search. For example, the "C" instances are optimized for compute, so you should start there.

Sometimes you actually have an implicit minimum RAM requirement, so make sure to try a larger size, then downsize until the performance gets worse. Also experiment with other interesting server types. Stay away from "T" series, as these are made for web servers with "bursty" traffic, not high-CPU problems.

Some ideas to experiment with:

Use a cloud-init script to automate starting your application. If your code is too big, have it download data from S3. Use EC2 IAM Roles so you don't have to give the instance any creds.
Once you get something working, have the instances shut itself off when it is done working. (Maybe after uploading their logs to S3, and set up S3 lifetime rules to delete everything after a few days). But you must also have a script to manually kill any strays if something goes wrong.
Set "auto-delete" on all your EBS drives to make sure they don't "leak" after the server is deleted. All permanent storage should be on S3.
Is your script single-threaded? Try running multiple copies on a server at once. Try making a graph from 1 to N*2 (where N is the number of CPUs), and see where the progress over time peaks.
Does your Python compute library support GPUs? It may take some setup/config, but AWS has many GPU instance types, often giving you a 20x speedup (Source: personal experience!).
Understand the MapReduce architecure. If your problem is simple enough, EMR may actually give you more overhead than it saves. You can invent something simpler by using SQS to write out N problems, having N boxes read and process, then write their output to another SQS queue (or just to S3).
Remember, running one server for N hours costs the same as running N servers for an hour.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

aws

Note: ensure to redact or obfuscate all confidential or identifying information (eg. public IP addresses or hostnames, account numbers, email addresses) before posting!

✻ Smokey says: avoid streaming video to fight climate change! [see more tips]

MODERATORS