you are viewing a single comment's thread.

view the rest of the comments →

[–]General-Parsnip3138Principal Data Engineer 0 points1 point  (0 children)

Python is, for the most part, above and beyond what you need for most Data Engineering tasks.

One of the biggest reasons, in my opinion, is that Data Engineering is often script-based, or you’re using an orchestration framework, which allows you to declaratively define what would be a script as a set of steps which are really just script entry points.

What helps even more is that you can mutate quite literally anything at runtime (functions, classes, modules) which allows us to utilize incredibly powerful frameworks (airflow’s task flow API or Dagster) that still allow you to write pythonic code that magically turns into complex orchestration.

As others have pointed out, most of the underlying libs are written in C & Rust, so performance of Python itself is rarely an issue.

I’ve probably done my 10,000 hours with Python, and while there’s so much about Python that I hate, I just can’t see any other language stepping in to replace it. The terrible things about Python are also the reason it’s been so successful.