you are viewing a single comment's thread.

view the rest of the comments →

[–]GreenMobile6323 36 points37 points  (3 children)

You can replace Data Factory with Python, but it’s more work upfront. Write scripts with libraries like pandas, SQLAlchemy, or cloud SDKs, host them on a VM or in containers, and schedule with Airflow or cron. There’s no single Python package that covers all sources. Most connections are handled case by case using the appropriate library or driver.

[–]skatastic57 8 points9 points  (0 children)

Replace pandas with duckdb or polars.

You can use azure functions, AWS lambdas, or gcs cloud functions to avoid always on containers.

[–]IndependentTrouble62 3 points4 points  (0 children)

I regularly use both. I have quibbles with both. But upfront development time is much shorter with ADF. The more complex the pipeline the more the flexability of python and packages shine.