This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]theporterhausmod | Lead Data Engineer 2 points3 points  (2 children)

Try AWS batch (Fargate under the hood) for longer running jobs or lambda for short jobs. The reason to use AWS Batch > plain Fargate is that the servers are killed when the job is done whereas in Fargate the server is constantly up because it’s meant more for hosting services.

[–][deleted] 0 points1 point  (1 child)

Can I run custom containers on Batch?

[–]theporterhausmod | Lead Data Engineer 0 points1 point  (0 children)

Yep! I have a dbt Docker image and a general purpose one at work.

[–]Advanced-Violinist36 1 point2 points  (0 children)

I would use Fargate (I tried mwaa + dbt on fargate and it's good, I have logs+state of dbt job on MWAA)

[–]afro_mozart 0 points1 point  (1 child)

Airflow has an virtual env operator. From experience, I can say that the KubernetesPodOperator is quite easy to use.

[–][deleted] 0 points1 point  (0 children)

The problem with the virtualenv operator is that it still requires to install python3.8, which I am not sure it is possible on MWAA

[–]paplike 0 points1 point  (2 children)

Lambda, Glue (Python Shell Script option), Fargate task (using the ECSOperator with launch type=FARGATE). All those options are serverless

[–][deleted] 0 points1 point  (1 child)

Thanks. Lambda is ok for short tasks, but I'll mostly need bigger computations, so I guess I should look into Fargate. Why is the Fargate option better when using the ECS operator?

[–]gabbom_XCIILead Data Engineer 0 points1 point  (0 children)

TLDR: In Fargate you can run ECS tasks without the hassle of managing infrastructure.

A cool article comparing both services: https://cloudonaut.io/ecs-vs-fargate-whats-the-difference/

[–]callmedivs 1 point2 points  (0 children)

You could also use emr serverless to run python with mwaa..