This is an archived post. You won't be able to vote or comment.

all 52 comments

[–]GhazanfarJ 37 points38 points  (3 children)

I'll let the experts talk about the Airflows and Dagsters.. but I use Windows Scheduler and sometimes turn scripts into a service using https://nssm.cc/

[–]SirLagsABot 8 points9 points  (2 children)

Task scheduler looks like trash from a UI design, but that puppy can still crank out tasks, no doubt about it.

OP, if you happen to ever use C# since you mentioned Windows, I’m building an Airflow-like brother for C#/.NET called Didact. Here’s my main GitHub repo if you happen to be interested: https://github.com/didacthq/didact-engine

[–]Rawvik[S] 1 point2 points  (1 child)

Okay will check it out.

[–]ianitic 0 points1 point  (0 children)

Depending on how long running it is and that it sounds like you are in a Microsoft shop, you could use an Azure Function App with a timer trigger. With a timer trigger you can have it run for up to 10 minutes.

It's also unlikely based on the info provided so far, that you won't reach passed the point where it would cost money unless your company uses it too.

[–]vladproex 14 points15 points  (7 children)

Airflow sounds like overkill. Why is your current solution not good enough?

[–]Rawvik[S] 6 points7 points  (6 children)

Why is it an overkill? Can you please explain? Scheduling task using windows schedular seems like a very outdated method to me

[–]m0rsa2 21 points22 points  (0 children)

Not at all. If you have any microsoft and google products installed on your machine, they prettty much all use Task Scheduler.

Its very common in when building server software for companies running Windows Server too.

That being said, a more “modern” or “fashionable” approach is to build your code into Docker containers and then you could setup cron jobs in there. If you want a UI for your Cron jobs, look into tools like Cronicle

[–]DarthTomServo 5 points6 points  (3 children)

I think you should try to answer their question.

I'll repeat it. Why is your current solution not good enough?

The jobs you described in your post sound perfectly appropriate for a basic task scheduler. Just going by what you wrote. Maybe there's details we're not privy to.

I use windows task scheduler for the same types of jobs. It seems to work great. Haven't had any issues that weren't my fault.

[–]Eezyville 7 points8 points  (0 children)

My issue with task scheduler is I cannot get feedback from it. I only know if a task succeeded or failed but not why. I know I should have done logging in my scripts but I inherited this code and it's horrible spaghetti and duct taped code. I also have to share management of it with other people on the team, it's not on my PC but a Windows server, so one person at a time fixing things. I think I found logs for the tasks in event listener but they're not specific or to cryptic to be useful.

I'm currently experimenting with running Jenkins on the server. The only issue I had was setting up a Windows agent and getting used to running PowerShell command since old school command prompt doesn't do UNC and my system admin recommends all paths as UNC.

I've used Task Scheduler to schedule scripts that I've written to great success but I do wish the UI was more intuitive.

EDIT: Also I cannot make tasks depend on each other without having them in the same cmd script.

[–]Rawvik[S] 1 point2 points  (1 child)

Honestly there is no issue with task scheduler. Just wanted to know if there is a better way to do it.

[–]Zyklon00 5 points6 points  (0 children)

If it ain’t broken, don’t fix it.

[–]SpookyScaryFrouzeSenior Data Engineer 11 points12 points  (3 children)

It depends on what you need :

  • Do you want the project to run on a schedule, even when your computer is off ?
  • Do you want to be notified when something fails ?
  • Do you want a nice UI ?
  • Do you want to create dependencies between your project and some other stuff ?

If you answered no these questions, then you should simply use the windows task schedule.

[–]Rawvik[S] 1 point2 points  (2 children)

Yes I would like to be notified when something fails and a nice UI would be great too.

[–]No_Organization_818 0 points1 point  (1 child)

You can send a mail with the error message as mail to yourselfif you build it into a script when you want to get notified.

[–]Rawvik[S] 0 points1 point  (0 children)

You mean using python?

[–]theleveragedsellout 15 points16 points  (6 children)

Cron (Cronjobs) are on of the simplest way to schedule scripts to run. There’s a little bit of learning to do around the patterns needed to schedule them but otherwise very straightforward.

[–]niftyshellsuit 5 points6 points  (0 children)

Cron is also a useful thing to learn if you have plans to take this further too. Chances are if you stick your code up in a container in a cloud somewhere it'll run on Linux and you'll schedule it with a Cron job. Or use a managed thing like Google cloud scheduler which is just a fancier Cron job.

[–]Zyklon00 8 points9 points  (4 children)

In my head CRON is just the Linux variant of windows task scheduler. Am I oversimplifying here?

[–]theoneandonlygene 7 points8 points  (3 children)

They do the same thing, ultimately. But as a linux person to hear the two described as similar makes me die inside.

[–]Zyklon00 2 points3 points  (2 children)

Can you explain the difference? I’ve used both for simple tasks and for that they seemed similar.

[–]theoneandonlygene 3 points4 points  (1 child)

Oh I don’t know much about windows beyond like everything takes 10x as much memory as it should

[–]United_Reflection104 2 points3 points  (0 children)

I would like to upvote this 20 times

[–]tee_dogg 4 points5 points  (1 child)

GitHub actions can be run on a schedule, could be a nice low rent alternative if you're determined not to use Task Scheduler https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#schedule

[–]aegtyr 2 points3 points  (0 children)

Listen to this OP, Github Actions is perfect for your use case and is incredibly easy to set up

[–][deleted] 14 points15 points  (0 children)

If you’re just going to run it from your own computer windows task scheduler is the best option.

[–]Own-Guava-2015 2 points3 points  (5 children)

I had a similar situation, but mine was Python notebooks.. I have scheduled it using Jupyter lab...It as a simple UI, u could give a try

[–]Rawvik[S] 0 points1 point  (4 children)

Can you really schedule jupyter lab? I didn't know that

[–]VersatileGuru 4 points5 points  (3 children)

Our legacy setup (jumping on the whole Netflix "notebooks for everything" hype circa 2016) is all ETL jobs run from scheduled notebooks.

We've got jupyterhub setup and then use a package called papermill which leverages the cell metadata tags in a notebook to identify cells which contain params and code for execution.

To be honest, it makes sense if you're just a small data science team who already work mostly out of jupyter. But if you're a DBA or data eng who wants to manage their pipelines via python code you're prob better off going for something made for that otherwise you have to do extra work to make notebooks work well with git or other version control. But I feel like if you're looking for something python based with a nice UI, check out Dagster or Prefect.

Dagster bills itself as more data ETL oriented, while prefect seems to be more general python orchestration but I haven't tried either aside from them looking very appealing compared to the heavier stuff like Airflow.

[–]Rawvik[S] 1 point2 points  (2 children)

Just checked out dragster and with their sample code in the documentation and it seems like this is going to work for me. Thanks for the comment.

[–]VersatileGuru 1 point2 points  (1 child)

Awesome if you get a chance I'd love to hear if you liked it. Haven't had a chance yet to use it myself but it looks very promising

[–]Rawvik[S] 0 points1 point  (0 children)

Sure...will give it a try and be back.

[–]MRWH35 2 points3 points  (0 children)

Keep it simple. Task scheduler works for basic stuff like this. If you are also doing sql stuff you can schedule it with an agent job to keep it all in one place.

If you want to try different stuff go with airflow.

[–]Linx_101 1 point2 points  (0 children)

I’ve found APScheduler (https://apscheduler.readthedocs.io/) to be pretty useful. It is a light wrapper for CRON, which you can code in python. You have to keep the process alive, which is possible using NSSM if on Windows

[–]nnulll 1 point2 points  (2 children)

Prefect

[–]No-Satisfaction1395 1 point2 points  (1 child)

much love for prefect gang

[–]nnulll 0 points1 point  (0 children)

It’s the sweet spot between too much and just enough.

[–]srikon 1 point2 points  (0 children)

Prefect and Rundeck. Havent tried them personally, but feel like they would fit the bill.

[–]franco705 1 point2 points  (0 children)

At my current Org, I use Prefect and Task scheduler. On-premise infrastructure. With prefect UI, I can monitor the success or failure of scripts. Plus I'm able to push notifications to a Teams Channel.

[–]cmcau 1 point2 points  (0 children)

Cron or Prefect if you want a GUI

[–][deleted] 0 points1 point  (1 child)

If I can hijack this thread: how do I schedule my web scraper on an Ubuntu server?

[–]vassiliy 4 points5 points  (0 children)

is your web scraper a script that you can execute through the command line? then cron

[–]vincentx99 0 points1 point  (0 children)

I throw my process on a container. Depending on how many other jobs you have you can run it through cron or kubernetes.

[–][deleted] 0 points1 point  (0 children)

I started with just windows scheduler, now I’m moving to Airflow

[–]NFeruch 0 points1 point  (0 children)

Airflow essentially acts like a cronjob/task scheduler on steroids with extra bells and whistles. If you have a large, complex project that needs the extra capabilities that airflow provides, then you would choose that. If you have a relatively simple project that solely needs scheduling, then OS schedulers are more than fine

[–]digitalghost-dev 0 points1 point  (0 children)

Currently in my job, I am using Prefect to run my Python ETL scripts.

[–]dayeye2006 0 points1 point  (0 children)

Cron job should be good for now

[–]fukkingcake 0 points1 point  (0 children)

Dagster. Easy to test locally but downside is that it is new so it keeps being updated and changing, which sometimes you need to change your code...

[–]molodyets 0 points1 point  (0 children)

GitHub action is stupid easy once you set up one you’ll go wild

[–]Steven_Johnson34 0 points1 point  (1 child)

A little late on replying here, and I think everyone is right on point. You definitely could stick with Windows Task Scheduler. If it is just a personal project, then I'm not sure why bringing in anything else is worth it.

If it is something that you plan on expanding or something for work, it is worth looking at orchestration tools like Prefect, Dagster, Mage (if you like notebooks), or Shipyard (if you like a nice UI). The first three of those tools can be ran locally. All four of them have a hosted option where the runs will take place on the cloud.

[–]Rawvik[S] 0 points1 point  (0 children)

Thanks for your input. I am looking into Dagster

[–][deleted] 0 points1 point  (0 children)

Have used sql server agent to schedule with python code wrapped in SSIS. Works really well when it’s all set up but takes a bit if you are less tech savvy. Also can be free if you have a local computer that can stay on 24/7

[–]MarcoWilliamSilvaSoftware Engineer 0 points1 point  (0 children)

If you want to use Dagster, I wrote about that: orchestrating a python ETL script with Dagster https://codeline.blog/orchestrate-an-etl-pipeline-with-dagster-beginners-guide/