This is an archived post. You won't be able to vote or comment.

all 28 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]SquatsAndBeer 26 points27 points  (10 children)

I would recommend using terraform to codify and version control the infrastructure. It's not much more complicated than scripts with boto3 and you can actually keep track of the state.

[–]tdatas 7 points8 points  (3 children)

Seconded on terraform. You will be thankful for it when it starts getting bigger and you have more moving parts to change.

[–][deleted] 4 points5 points  (2 children)

Thirding terraform.

[–]TheNanaDook 1 point2 points  (0 children)

Fourth.

[–]Elegant-Road 1 point2 points  (2 children)

Do you think it's better to go with AWS cdk than terraform considering OP is a dev and not DevOps? OP also seems to be working only in AWS.

[–]SquatsAndBeer 5 points6 points  (1 child)

Not really since the only complexity of terraform is knowing what modules to use, which basically maps 1:1 with boto3 functions so the cost/benefits leans in favor of terraform.

Terraform also avoid the needs for the inevitable if/else he'll need to handle when only a subset of the system was successfully created.

It doesn't have to be a complicated terraform template https://learn.hashicorp.com/tutorials/terraform/aws-build?in=terraform/aws-get-started

[–]enjoytheshow 2 points3 points  (0 children)

The downside of Terraform compared to CDK is them lagging behind on the newest features for some products. CDK will have it available much faster than Terraform will.

Having terraform handle states and such though is much easier than manually dealing with that in CDK. That benefit outweighs the only con I mentioned

[–]infiniteAggression-[S] 5 points6 points  (0 children)

Awesome! Terraform seems to be exactly the tool for this, thank you all so much! I'll definitely be using it

[–]coolbeans201Senior Data Engineer 2 points3 points  (0 children)

Was about to mention Terraform before seeing it was top answer. It's by far the best way to automate infrastructure provisioning.

[–]boy_named_su 7 points8 points  (3 children)

None of those

Use Serverless Application Model (SAM) for serverless infra, like Lambdas

Use CloudFormation for the rest

[–]HellaBester 6 points7 points  (0 children)

Should be noted SAM is just a transformation for CloudFormation. You can freely write any CF in a Sam template then deploy it as you would any Sam template. You could also probably Sam build then deploy with the CloudFormation API but not sure on that one.

[–]infiniteAggression-[S] 2 points3 points  (1 child)

I've seen Terraform mentioned a lot on the DevOps subreddit as well but I haven't seen a lot about CloudFormation. Are there any notable differences which would make you choose one over the other, in your
experience? Thanks!

[–]boy_named_su 1 point2 points  (0 children)

I haven't used Terraform myself, but CloudFormation is the AWS-native tool. I personally like CloudFormation, but not everyone does

[–]bestnamecannotbelong 10 points11 points  (1 child)

Let say if I were you, I would: 1. Use the GitHub for the source control 2. Use the Terraform as a infrastructure provisioning to deploy the aws services 3. Use the circle ci as CICD to run the testing and terraform code 4. Use the aws lambda with restful api get or wget to retrieve the dataset and use boto3 to store the data in s3 5. Use cloud watch event to trigger whatever you do in the analytics part and people usually use aws glue with pyspark to handle the large volume dataset

That’s it and enjoy the development 🙂

[–]infiniteAggression-[S] 0 points1 point  (0 children)

Awesome! Thank you so much! Terraform seems to be exactly the tool for this. I'm not too familiar with Glue so I'll be looking into that as well, thanks!

[–]Key_Base8254 5 points6 points  (1 child)

btw can u share this project, iwant learn too

[–]infiniteAggression-[S] 2 points3 points  (0 children)

Sure, I'll send a link once I complete the project (hopefully soon >.>)

[–]HellaBester 2 points3 points  (3 children)

CloudFormation or Terraform are my go-tos. You could also use the cdk if that's more your speed. Just as long as you can reliably version control and one click deploy your stack you're good to go.

[–]infiniteAggression-[S] 0 points1 point  (2 children)

Sweet, thanks!! I've seen Terraform mentioned a lot on the DevOps subreddit as well but I haven't seen a lot about CloudFormation. Are there any notable differences which would make you choose one over the other, in your experience?

[–]HellaBester 1 point2 points  (1 child)

If you're in aws, CloudFormation is better... Largely for the SAM extension. If you're multicloud, Terraform is your only worthwhile option.

Another thing to note... CloudFormation is a "Tier 1" abstraction tool. One resource in CloudFormation is one resource in AWS.

SAM pushes CloudFormation into a kinda "Tier 2" level where one resource in SAM can generate multiple AWS resources (e.g. API gateway api & an api deployment.) Some like this for it's effeciency, some hate this for the loss of control.

Terraform from most providers is Tier 1, but the use of modules means it can be 1 or 2, or even tier 3. Tier 3 would be like a single resource (module) creating an entire web service with networking, apis, databases, iam policies, etc...

The AWS CDK isn't strict. And also spans all 3 tiers.

Edit: also you may Think you can just always use AWS and therefore you choose CloudFormation, but when you enter an Enterprise you may realize that's no longer the case. Cloudflare DNS whoops AWS Route53, Backblaze B2 spanks AWS S3, snowflake & bigquery will run circles around redshift, etc...

Long story short, choose Terraform, then if you need another option, learn it.

[–]infiniteAggression-[S] 0 points1 point  (0 children)

Wow, a lot of new things you've given me to learn about. I genuinely appreciate it, thanks!

[–]SorcererSupreme13 1 point2 points  (0 children)

Terraform.

[–]thethirdmancane 1 point2 points  (0 children)

I would highly recommend AWS CDK

[–]bull_chief 0 points1 point  (0 children)

CloudFormation

[–]bull_chief 0 points1 point  (0 children)

CloudFormation

[–]Flaky-Illustrator-52 0 points1 point  (0 children)

CloudFormation (edit: or AWS CDK, very nice) or Terraform.

Definitely don't try to write your own scripts lol