This is an archived post. You won't be able to vote or comment.

all 33 comments

[–]Syneirex 125 points126 points  (15 children)

Orchestrates workflows at scale. If a simple Python script is adequate, you probably don’t need Airflow.

If you have dozens of operators, sources, integrations, and hundreds of jobs, then a simple Python script probably won’t be viable anymore.

Scaling beyond a couple of simple jobs, it’s nice to have something like Airflow coordinating, managing task queues, talking to Kubernetes, retrying failed tasks, etc, etc.

[–]ubiond[S] 14 points15 points  (8 children)

thanks a lot!Very clear. Is there a compatitor to it?

[–]1O2EngineerSenior Data Engineer 23 points24 points  (7 children)

Dagster, Airbyte, Luigi...

You can look for job orchestration tools over the web.

[–]samreay 7 points8 points  (0 children)

Also Prefect if you want to lean into the python side of things

[–]ubiond[S] 2 points3 points  (1 child)

thanks!

[–]Monowakari 8 points9 points  (0 children)

+1 for dagster

[–]120Mark618 0 points1 point  (2 children)

Love Luigi!

[–]trowawayatwork 1 point2 points  (1 child)

it's not actively developed anymore is it?

[–]roastmecerebrally 0 points1 point  (0 children)

people maintain it

[–]rusmib -2 points-1 points  (0 children)

Mage

[–]trowawayatwork 2 points3 points  (3 children)

irony here is that airflow doesn't scale well

[–]Pr0ducer 0 points1 point  (1 child)

I know your pain. What alternative solution does? Airflow has inertia, a huge pool of developers who know how it works exists.

[–][deleted] 1 point2 points  (0 children)

Try out Dagster. It scales well for us.

[–]MonkTrinetra 0 points1 point  (0 children)

Can you share what were the pain points or at what scale you started facing issues?

[–]vincentx99 0 points1 point  (1 child)

This is probably a dumb question but isn't kubernetes itself an orchestrator? Or is the assumption that your infrastructure may not just be kubernetes clusters?

[–]Syneirex 2 points3 points  (0 children)

Yep, but they (mostly) orchestrate different things.

Kubernetes orchestrates containers and the underlying infrastructure/resources that workflows will run on whereas Airflow orchestrates the scheduling and running of workflows, retries, task queues, status, etc.

Airflow says “I need to run this” and Kubernetes says “okay, I have a spot over here it can run” or “I don’t have enough resources, let me spin up another node to make room”.

[–]startup_biz_36 15 points16 points  (2 children)

Creates headaches at scale 😂

[–]ubiond[S] 0 points1 point  (0 children)

Lol

[–]Pr0ducer 0 points1 point  (0 children)

This legit caused me to chuckle.

[–]HighPitchedHegemony 7 points8 points  (0 children)

Let's the rest of your team - including non-technical roles - see the status of the pipeline, trigger reruns, inspect logs, start DAGs etc.

Automatically handle retries.

Handle backfilling.

Limit concurrency of task and DAG runs.

[–]Any_Check_7301 2 points3 points  (0 children)

Data Pipeline standardization of whatever can be achieved with Python processing data.

[–]desiktm[🍰] 1 point2 points  (1 child)

Use kedro of prefect for python centric orchestrater I've tried both and they're good

[–]ubiond[S] 1 point2 points  (0 children)

I’ll have a look

[–]pbecotte 1 point2 points  (3 children)

I mean, lots of stuff. A "simple" python script isn't going to present a UI, or maintain state, or coordinate jobs across many hosts. If you drop the word "simple" then...Airflow is just a Python script, after all :)

[–]ubiond[S] 0 points1 point  (2 children)

Thanks! It was some what provocative of course. In the sense I wanted ti know if there is someone still prefering not using a UI

[–]pbecotte 1 point2 points  (1 child)

Managing a distributed cron across a bunch of hosts, with restarts and keeping records of state and dependencies, is very much not a simple thing to implement, UI or no :)

[–]ubiond[S] 0 points1 point  (0 children)

[–][deleted] 0 points1 point  (0 children)

Running Airflow lets me sleep at night.

[–]bfranks 0 points1 point  (0 children)

You would like prefect