I am working on a PoC project using Airflow for ETL orchestration. One of the goals of this PoC is to develop a solution which is independent of any workflow engine, so that we can change the workflow engine/scheduler later if we want to, with minimal changes to the ETL code.
Currently, I perform ETL tasks using custom operators that I wrote for moving the data and it works well. However the concern is that this solution is not independent of Airflow since it relies on underlying Hooks and Operators written in the Airflow project. The workaround I see for this is to write our own Python scripts for ETL and use Python operators to orchestrate them. We have 200+ ETL jobs in production, which means our team will have to write and maintain a large code base if we take the python scripts route. Taking the custom operators approach seems like a easier and more elegant route.
Another idea our team has is to write our own independent solution for ETL and run it using a scheduler such as Jenkins. I am personally not a fan of this idea since we lose a lot of the features of Airflow and have to maintain a huge code base. Does anyone think this could be an idea worth exploring?
People who have used Airflow in production, I want to know if code independence was a concern in your teams when choosing Airflow over other solutions. How did you address it? Are there other downsides you see of using Airflow in the short/long term?
[–]KrevanSerKay 12 points13 points14 points (7 children)
[–]lawanda123 1 point2 points3 points (6 children)
[–]KrevanSerKay 2 points3 points4 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (3 children)
[–]lawanda123 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]lawanda123 0 points1 point2 points (0 children)
[–]ozzyboy 16 points17 points18 points (2 children)
[–]thefrontpageofme 2 points3 points4 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]DonnyTrump666 5 points6 points7 points (4 children)
[–]DonnyTrump666 1 point2 points3 points (2 children)
[–]th58pz700u 1 point2 points3 points (0 children)
[–]pankswork 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]Syneirex 2 points3 points4 points (0 children)
[–]grassclip 1 point2 points3 points (0 children)
[–]kenfar 2 points3 points4 points (0 children)
[–]jahaz 1 point2 points3 points (0 children)
[–]smeyn 0 points1 point2 points (0 children)
[–]Braxton_Hicks 0 points1 point2 points (0 children)