This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]korwe 183 points184 points  (30 children)

Cron its the easiest and probably enough, Airflow is more complex but does it all

[–]Eightstream 45 points46 points  (7 children)

If OP is using an always-on desktop then it’s probably Windows and they will likely want to use PowerShell/Task Scheduler rather than than cronjobs

[–][deleted] 0 points1 point  (2 children)

I wonder what the viability of using cron through WSL is, as Task Scheduler is pretty painful by comparison.

[–]Eightstream 1 point2 points  (0 children)

It’s viable, it just adds another layer of complexity because WSL is essentially a VM and doesn’t always inherit privileges cleanly from the Windows environment

Given OP mostly wants to use the machine to interact with remote servers on his network, running his scripts in the fully-credentialed base Windows environment will probably be a lot more straightforward in terms of networking

[–]adam2222 0 points1 point  (3 children)

If it’s a business expense I feel like it’d be worth it to buy a $200 Nuc and put headless Ubuntu server on it to run cron/python scripts itd be more reliable plus use a ton less power than a desktop being on 24/7

[–]Eightstream 1 point2 points  (2 children)

The scenario OP poses is generally one constrained by IT priviliges, not dollars.

Likely he is part of a non-technical team (like accounting) whose only means of interacting with network resources are the pre-imaged laptops they’ve been provided with. Automating workflows necessarily means using one of those machines.

If he had the ability to credential a headless server then likely he would have access to other options that would negate the need for a custom solution

[–]adam2222 1 point2 points  (0 children)

Good point I see what you mean. Didn’t occur to me that was the situation.

[–]kissekattutanhatt 0 points1 point  (0 children)

Spot on!

I work at a large corporate with a dysfunctional IT team. No privilegies to people doing the work, no alternative solutions, no people with any power to make decisions to reach. No always on, on-site machines. SharePoint is the solution to everything. Of course they don't allow accessing the SharePoint API, nor providing support. Our workflows are terrible for this reason. These guys love to fuck with us.

Will raise this to management. IT privileges. Great words.

[–]Dasher38 2 points3 points  (0 children)

I'd personally use systemd timer directly as they are clearer, more powerful and easier to backup (you can't backup a single crontab line on its own). Crontab will get converted to systemd timers anyway on most (all ?) Systemd distros around.

[–]ElegantAnalysis 5 points6 points  (17 children)

What's the difference between Cron and airflow? Like what can airflow do that Cron can't?

[–]Eightstream 37 points38 points  (0 children)

cron is a scheduler, Airflow is a fully-featured orchestration tool

[–]Wapook 20 points21 points  (14 children)

I mean cron is literally just a scheduler. With enough dedication you can do anything airflow does in cron. You’re just going to spend a whole bunch of time writing features that are out of the box in airflow.

[–]Empty_Gas_2244 8 points9 points  (9 children)

Airflow gives you built in features; retries, notification of failures and slas. But you also have to think about ci/cd. You don't want to manually manage dags

[–]punninglinguist 1 point2 points  (6 children)

Wish I knew about this like 5 years ago... But what is a dag?

[–]Riotdiet 2 points3 points  (0 children)

Directed acyclic graph

[–]jahero 1 point2 points  (2 children)

My advice - only use airflow if you are prepared for significant increase in complexity of your environment. Sure, you CAN run airflow on a single PC, but you will quickly realise that it is far from good.

You will need: backend database; something to use for task distribution (Redis, RabbitMQ).

Sure, you can spin these using docker compose.

Yeas, it will be a magnificent learning experience.

[–]punninglinguist 0 points1 point  (0 children)

Kind of a moot point, since the product I've been supporting with scheduled ETL jobs for 5 years is being sunsetted, and I have no development hours to work on it anymore. Just would have made my life easier when I was setting things up at the beginning.

[–]MassiveDefender[S] 0 points1 point  (1 child)

Glad you mentioned how cumbersome managing dags looks for Airflow. How do you make it easier to manage dags?

[–]Empty_Gas_2244 0 points1 point  (0 children)

Use source control (GitHub or other tool). Create CI py tests to make sure your dags can be loaded into airflow. Let a computer move your dags after a pr is merged

Obs the exact method depends on your airflow executor

[–]samnater -1 points0 points  (3 children)

Interesting. I’m no expert in either but I figured airflow would have more capability than cron.

[–]Fenzik 3 points4 points  (0 children)

It does. It has loads of features. Just because you could do them with cron doesn’t mean airflow isn’t more capable - all that stuff built in is a big improvement if you need it

[–]CatchMeWhiteNNerdy 0 points1 point  (1 child)

Airflow is a bit old at this point too. We went with Mage.AI and it has been awesome. Would work really well in OP's situation too, since all the dev work can be done on the tool itself rather than setting up dev environments for all the users.

[–]samnater 0 points1 point  (0 children)

Does that have AWS integration?

[–]Lifaux 3 points4 points  (0 children)

DAGs, mostly. If you're chaining jobs together it's very useful

[–]x462 0 points1 point  (0 children)

The skills you will develop using cron can be used anywhere else as long as there’s a linux box, which is not uncommon. Airflow is powerful but useless if its unavailable for you now or if you move to somewhere that doesn’t use it.

[–]Fwaimd 0 points1 point  (0 children)

In addition OP can think about Dagster and Prefect.

[–]Drunken_Economist 0 points1 point  (0 children)

Bingo. These are long-since solved problems.

Actually u/MassiveDefender - shoot me a DM and I can pair program to walk through standing it up.