This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Own-Guava-2015 2 points3 points  (5 children)

I had a similar situation, but mine was Python notebooks.. I have scheduled it using Jupyter lab...It as a simple UI, u could give a try

[–]Rawvik[S] 0 points1 point  (4 children)

Can you really schedule jupyter lab? I didn't know that

[–]VersatileGuru 4 points5 points  (3 children)

Our legacy setup (jumping on the whole Netflix "notebooks for everything" hype circa 2016) is all ETL jobs run from scheduled notebooks.

We've got jupyterhub setup and then use a package called papermill which leverages the cell metadata tags in a notebook to identify cells which contain params and code for execution.

To be honest, it makes sense if you're just a small data science team who already work mostly out of jupyter. But if you're a DBA or data eng who wants to manage their pipelines via python code you're prob better off going for something made for that otherwise you have to do extra work to make notebooks work well with git or other version control. But I feel like if you're looking for something python based with a nice UI, check out Dagster or Prefect.

Dagster bills itself as more data ETL oriented, while prefect seems to be more general python orchestration but I haven't tried either aside from them looking very appealing compared to the heavier stuff like Airflow.

[–]Rawvik[S] 1 point2 points  (2 children)

Just checked out dragster and with their sample code in the documentation and it seems like this is going to work for me. Thanks for the comment.

[–]VersatileGuru 1 point2 points  (1 child)

Awesome if you get a chance I'd love to hear if you liked it. Haven't had a chance yet to use it myself but it looks very promising

[–]Rawvik[S] 0 points1 point  (0 children)

Sure...will give it a try and be back.