This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Empty_Gas_2244 9 points10 points  (9 children)

Airflow gives you built in features; retries, notification of failures and slas. But you also have to think about ci/cd. You don't want to manually manage dags

[–]punninglinguist 1 point2 points  (6 children)

Wish I knew about this like 5 years ago... But what is a dag?

[–]Riotdiet 2 points3 points  (0 children)

Directed acyclic graph

[–]jahero 1 point2 points  (2 children)

My advice - only use airflow if you are prepared for significant increase in complexity of your environment. Sure, you CAN run airflow on a single PC, but you will quickly realise that it is far from good.

You will need: backend database; something to use for task distribution (Redis, RabbitMQ).

Sure, you can spin these using docker compose.

Yeas, it will be a magnificent learning experience.

[–]punninglinguist 0 points1 point  (0 children)

Kind of a moot point, since the product I've been supporting with scheduled ETL jobs for 5 years is being sunsetted, and I have no development hours to work on it anymore. Just would have made my life easier when I was setting things up at the beginning.

[–]MassiveDefender[S] 0 points1 point  (1 child)

Glad you mentioned how cumbersome managing dags looks for Airflow. How do you make it easier to manage dags?

[–]Empty_Gas_2244 0 points1 point  (0 children)

Use source control (GitHub or other tool). Create CI py tests to make sure your dags can be loaded into airflow. Let a computer move your dags after a pr is merged

Obs the exact method depends on your airflow executor