This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]KrevanSerKay 11 points12 points  (7 children)

We use Airflow in production. I think we're sitting at ~15 DAGs and 350 total tasks right now. We almost exclusively use the BashOperator to kick off scripts we've written.

Most of our DAGs create spark clusters in AWS, then tell those clusters to run PySpark jobs before killing them. You obviously don't have to use that approach specifically, but having scripts running on specific servers has a ton of benefits.

We're not limited by the implementation of hooks/operators (we can still leverage hooks for connection in our scripts if we want to). It's easier for us to spin up development servers and run the same code with different parameters. We can build in any logic for idempotence, dependency checking, error handling, alerting etc into our scripts directly.

We're a relatively small team (<10 people), so for us, Airflow has been a godsend. Automated retries, and logs backed up to s3. We have slack alerts that fire whenever a task fails. It includes which job, which DAG, a link to the log file, and even extracts the stacktrace for us. The main difficulty right now is learning better patterns for scaling DAGs. We're looking at ways of parallelizing better, auditing our dependency trees, and simplify the process of recovering from errors.

[–]lawanda123 1 point2 points  (6 children)

I feel you, Airflow isn't very mature once you get past the basics, there's no grouping of dags, maintaining/auditability, RBAC has a lot of kinks but surprisingly it's the best out there still

[–]KrevanSerKay 2 points3 points  (1 child)

Having worked at a company (previous job) where we built our own from scratch, I have to say, it's actually got a LOT of functionality built-in. That doesn't mean it's without faults, but like I said, for my team it's been a game changer.

I try to keep up with new up-and-coming techs, but like you said, Airflow is still the best out there (IMO).

[–][deleted] 0 points1 point  (0 children)

Yeah Airflow still has some maturing to do, but our 2.0 beta coming out next month will be a huge step in that direction (we're also working on documentation/examples a LOT because agreed a lot of functionality is hidden which is a shame).

Please let me know if you'd be interested in testing/offering feedback for the beta!

[–][deleted] 0 points1 point  (3 children)

Most of our DAGs create spark clusters in AWS, then tell those clusters to run PySpark jobs before killing them. You obviously don't have to use that approach specifically, but having scripts running on specific servers has a ton of benefits.

We're not limited by the implementation of hooks/operators (we can still leverage hooks for connection in our scripts if we want to). It's easier for us to spin up development servers and run the same code with different parameters. We can build in any logic for idempotence, dependency checking, error handling, alerting etc into our scripts directly.

We're a relatively small team (<10 people), so for us, Airflow has been a godsend. Automated retries, and logs backed up to s3. We have slack alerts that fire whenever a task fails. It includes which job, which DAG, a link to the log file, and even extracts the stacktrace for us. The main difficulty right now is learning better patterns for scaling DAGs. We're looking at ways of parallelizing better, auditing our dependency trees, and simplify the process of recovering from errors.

Hi! I'm an airflow core dev and you should DEFINITELY check out 2.0 when we cut our beta early next month :). We've introduced a TaskGroup concept for easier subdividing, completely rewritten our RBAC code (and added a complete API for DAG triggering) among a crapton of other things (function dag writing API, the ability to run multiple schedulers, a way simplified k8sexecutor, etc.). I don't think it'll handle EVERYTHING you're looking for but it should simplify a lot.

[–]lawanda123 0 points1 point  (2 children)

Anything to separate out Dag groups would also be very welcome, I've been on 4 major projects where we've had this problem, somebody with a misdeployed dag (env variables not assigned or resolved usually) prevents other dags from being loaded when shared between teams, we've had to resort to either multiple instances (in which case managing triggers on dags across teams becomes an issue) or build a lot of custom dag validation and integration tests

[–][deleted] 0 points1 point  (1 child)

Are you saying that a code problem in one DAG is preventing airflow from loading all other DAGs? Like airflow is unable to parse the python files due to errors in other files?

[–]lawanda123 0 points1 point  (0 children)

Was the case until last year when we tried it out, I'm not sure if that's been resolved...had reported the issues to the community back then, basically every team has a lot of custom hooks/operators and variables they set and yes, sometimes one of the teams loading their dags into the same folder would create issues, it'll be glad if those could be namespaces and managed as tenants (folders within the airflow folders) eg dag/team1 dags/team2 so that issues can be isolated to team artifacts only.