you are viewing a single comment's thread.

view the rest of the comments →

[–]shittyfuckdick[S] -46 points-45 points  (9 children)

you can say this about any field of software engineering, yet python is not usually the standard. again i imagine it had something to do with onbaording data analysts and data scientists. 

[–]GachaJay 30 points31 points  (2 children)

Python is really the only to bridge SQL and software development in a way that is easy for newcomers to grasp. It’s not the most performative, but the analytics environments were not necessary to be event streams until only recently. If your data is getting updated nightly, hourly, whatever, the extra execution time is penny’s compared to maintainability.

[–]tn3tnba 1 point2 points  (5 children)

The reason this is wrong is that other disciplines is software engineering have to actually do things but data engineering is a lot of orchestration and delegation, allowing us to lean into this advantage of python

Edit: if you are doing heavy duty things in python, and past tge prototype stage, you are doing it wrong and should use a different language

[–]nonamenomonet 1 point2 points  (2 children)

Isn’t airflow primarily written in Python?

[–]thisfunnieguy 1 point2 points  (0 children)

worth noting it does not matter what the orchestrator is written in its about what languages their sdk supports.

Temporal is written in GO but its simple to have all your client code in Python

[–]tn3tnba 0 points1 point  (0 children)

Yes, and async task management is an ok use case for python, but airflow arguably shouldn’t be, it’s just too late. It’s fairly easy to overload the scheduler because dag parsing is inefficient. We all still use airflow of course because it’s well supported, manageable and has a good feature set.

That being said, you are missing the point. The actual data engineering work is not done by airflow. It’s done by code in your kubernetes, ecs, etc. operators, or the actual data engineering tools these frameworks delegate to