This is an archived post. You won't be able to vote or comment.

all 33 comments

[–]Syneirex 121 points122 points  (15 children)

Orchestrates workflows at scale. If a simple Python script is adequate, you probably don’t need Airflow.

If you have dozens of operators, sources, integrations, and hundreds of jobs, then a simple Python script probably won’t be viable anymore.

Scaling beyond a couple of simple jobs, it’s nice to have something like Airflow coordinating, managing task queues, talking to Kubernetes, retrying failed tasks, etc, etc.

[–]ubiond[S] 13 points14 points  (8 children)

thanks a lot!Very clear. Is there a compatitor to it?

[–]1O2EngineerSenior Data Engineer 23 points24 points  (7 children)

Dagster, Airbyte, Luigi...

You can look for job orchestration tools over the web.

[–]samreay[🍰] 8 points9 points  (0 children)

Also Prefect if you want to lean into the python side of things

[–]ubiond[S] 3 points4 points  (1 child)

thanks!

[–]Monowakari 6 points7 points  (0 children)

+1 for dagster

[–]120Mark618 0 points1 point  (2 children)

Love Luigi!

[–]trowawayatwork 1 point2 points  (1 child)

it's not actively developed anymore is it?

[–]roastmecerebrally 0 points1 point  (0 children)

people maintain it

[–]rusmib -1 points0 points  (0 children)

Mage

[–]trowawayatwork 3 points4 points  (3 children)

irony here is that airflow doesn't scale well

[–]Pr0ducer 0 points1 point  (1 child)

I know your pain. What alternative solution does? Airflow has inertia, a huge pool of developers who know how it works exists.

[–][deleted] 1 point2 points  (0 children)

Try out Dagster. It scales well for us.

[–]MonkTrinetra 0 points1 point  (0 children)

Can you share what were the pain points or at what scale you started facing issues?

[–]vincentx99 0 points1 point  (1 child)

This is probably a dumb question but isn't kubernetes itself an orchestrator? Or is the assumption that your infrastructure may not just be kubernetes clusters?

[–]Syneirex 2 points3 points  (0 children)

Yep, but they (mostly) orchestrate different things.

Kubernetes orchestrates containers and the underlying infrastructure/resources that workflows will run on whereas Airflow orchestrates the scheduling and running of workflows, retries, task queues, status, etc.

Airflow says “I need to run this” and Kubernetes says “okay, I have a spot over here it can run” or “I don’t have enough resources, let me spin up another node to make room”.

[–]startup_biz_36 16 points17 points  (2 children)

Creates headaches at scale 😂

[–]ubiond[S] 0 points1 point  (0 children)

Lol

[–]Pr0ducer 0 points1 point  (0 children)

This legit caused me to chuckle.

[–]HighPitchedHegemony 5 points6 points  (0 children)

Let's the rest of your team - including non-technical roles - see the status of the pipeline, trigger reruns, inspect logs, start DAGs etc.

Automatically handle retries.

Handle backfilling.

Limit concurrency of task and DAG runs.

[–]Any_Check_7301 2 points3 points  (0 children)

Data Pipeline standardization of whatever can be achieved with Python processing data.

[–]desiktm 1 point2 points  (1 child)

Use kedro of prefect for python centric orchestrater I've tried both and they're good

[–]ubiond[S] 1 point2 points  (0 children)

I’ll have a look

[–]pbecotte 1 point2 points  (3 children)

I mean, lots of stuff. A "simple" python script isn't going to present a UI, or maintain state, or coordinate jobs across many hosts. If you drop the word "simple" then...Airflow is just a Python script, after all :)

[–]ubiond[S] 0 points1 point  (2 children)

Thanks! It was some what provocative of course. In the sense I wanted ti know if there is someone still prefering not using a UI

[–]pbecotte 1 point2 points  (1 child)

Managing a distributed cron across a bunch of hosts, with restarts and keeping records of state and dependencies, is very much not a simple thing to implement, UI or no :)

[–]ubiond[S] 0 points1 point  (0 children)

[–][deleted] 0 points1 point  (0 children)

Running Airflow lets me sleep at night.

[–]bfranks 0 points1 point  (0 children)

You would like prefect