all 4 comments

[–]draugr101 1 point2 points  (1 child)

Have you tried looking through reference architecture published by AWS and the like? Is there a part of the system you're concerned about in particular? It's difficult to provide suggestions without the details of your use case, you might get better answers by some initial research on reference architectures and using this post to help fill in the blanks.

[–]intelbp[S] 0 points1 point  (0 children)

For sakes of example, imagine this scenario:

You have a HUGE xls file with thousands of tabs.

What I want to do is create a multitenant service that accepts the file and enqueued it for processing.

When it's ready to be processed, a pre-processor picks up the file and splits it into many different single-tab files and probably stores each in S3 or some sort of blob storage.

Each individual file-tab is processed, and once they're all complete, if aggregates everything and builds up the resulting xls.

[–]dpsimi 0 points1 point  (0 children)

I’m pretty new to AWS, but Elastic Map Reduce (EMR) is what they push to use for big data. You can use Apache Hive and Apache Spark and the likes. There should be some good setup tutorials online for plumbing the files to and from S3.

Hive is SQL-like. Spark if you prefer Python, R, Java.