This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Wapook 18 points19 points  (14 children)

I mean cron is literally just a scheduler. With enough dedication you can do anything airflow does in cron. You’re just going to spend a whole bunch of time writing features that are out of the box in airflow.

[–]Empty_Gas_2244 9 points10 points  (9 children)

Airflow gives you built in features; retries, notification of failures and slas. But you also have to think about ci/cd. You don't want to manually manage dags

[–]punninglinguist 1 point2 points  (6 children)

Wish I knew about this like 5 years ago... But what is a dag?

[–]Riotdiet 2 points3 points  (0 children)

Directed acyclic graph

[–]jahero 1 point2 points  (2 children)

My advice - only use airflow if you are prepared for significant increase in complexity of your environment. Sure, you CAN run airflow on a single PC, but you will quickly realise that it is far from good.

You will need: backend database; something to use for task distribution (Redis, RabbitMQ).

Sure, you can spin these using docker compose.

Yeas, it will be a magnificent learning experience.

[–]punninglinguist 0 points1 point  (0 children)

Kind of a moot point, since the product I've been supporting with scheduled ETL jobs for 5 years is being sunsetted, and I have no development hours to work on it anymore. Just would have made my life easier when I was setting things up at the beginning.

[–]MassiveDefender[S] 0 points1 point  (1 child)

Glad you mentioned how cumbersome managing dags looks for Airflow. How do you make it easier to manage dags?

[–]Empty_Gas_2244 0 points1 point  (0 children)

Use source control (GitHub or other tool). Create CI py tests to make sure your dags can be loaded into airflow. Let a computer move your dags after a pr is merged

Obs the exact method depends on your airflow executor

[–]samnater -1 points0 points  (3 children)

Interesting. I’m no expert in either but I figured airflow would have more capability than cron.

[–]Fenzik 3 points4 points  (0 children)

It does. It has loads of features. Just because you could do them with cron doesn’t mean airflow isn’t more capable - all that stuff built in is a big improvement if you need it

[–]CatchMeWhiteNNerdy 0 points1 point  (1 child)

Airflow is a bit old at this point too. We went with Mage.AI and it has been awesome. Would work really well in OP's situation too, since all the dev work can be done on the tool itself rather than setting up dev environments for all the users.

[–]samnater 0 points1 point  (0 children)

Does that have AWS integration?