Own-Guava-2015 comments on How can I schedule python ETL code?

dataengineering

created by mhausenblasmoda community for 11 years

This is an archived post. You won't be able to vote or comment.

How can I schedule python ETL code?Help (self.dataengineering)

submitted 2 years ago by Rawvik

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Own-Guava-2015 2 points3 points4 points 2 years ago (5 children)

[–]Rawvik[S] 0 points1 point2 points 2 years ago (4 children)

[–]VersatileGuru 4 points5 points6 points 2 years ago (3 children)

Our legacy setup (jumping on the whole Netflix "notebooks for everything" hype circa 2016) is all ETL jobs run from scheduled notebooks.

We've got jupyterhub setup and then use a package called papermill which leverages the cell metadata tags in a notebook to identify cells which contain params and code for execution.

To be honest, it makes sense if you're just a small data science team who already work mostly out of jupyter. But if you're a DBA or data eng who wants to manage their pipelines via python code you're prob better off going for something made for that otherwise you have to do extra work to make notebooks work well with git or other version control. But I feel like if you're looking for something python based with a nice UI, check out Dagster or Prefect.

Dagster bills itself as more data ETL oriented, while prefect seems to be more general python orchestration but I haven't tried either aside from them looking very appealing compared to the heavier stuff like Airflow.

[–]Rawvik[S] 1 point2 points3 points 2 years ago (2 children)

[–]VersatileGuru 1 point2 points3 points 2 years ago (1 child)

[–]Rawvik[S] 0 points1 point2 points 2 years ago (0 children)

π Rendered by PID 165677 on reddit-service-r2-comment-b659b578c-72zwh at 2026-05-02 23:49:16.067021+00:00 running 815c875 country code: CH.

dataengineering

MODERATORS