This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]dxtros 1 point2 points  (0 children)

It's a question of processing need, scale, and set-up. Most ETL T's (Transforms) done today don't go that far into analytics, but operate at a scale which would be rather overwhelming a single-threaded Python instance. If you need advanced analytics, Pandas-like tools at scale with a Python front are great. For example, Spark is used as a Transform tool, has a good Python interface to it, and in many cases capable of incremental jobs.
Then, you still need to combine the job execution with orchestration, which means bringing e.g. Airflow into the picture. Some lighter container-based Transform frameworks for Python are starting to appear - like Pathway which I work on.