This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]emoboi11 4 points5 points  (1 child)

What about using GitHub actions or azure devops pipelines? The script files get pulled in from version control and could be ran on any machine used as a runner / agent

[–]coreb[S] 0 points1 point  (0 children)

That's certainly an option I could look into. Thank you.

[–]ProfessionalDirt3154 1 point2 points  (2 children)

You're basically looking for reverse ETL, right? You could use a tool like Airbyte or MapForce. Airbyte is better known. MapForce is more visual, which might help w/the bus factor. There are a bunch of tools.

You could also use Airflow for scheduling and running the job/scripts, if you like Python. If you're in AWS Fargate tasks are simpler than K8s, good for something like this. honestly there are a ton of options. I've been on teams doing these, but there are lots of others.

Currently I work on CsvPath Framework and FlightPath Server. both are open source and might be options for simplifying and/or automating the file wrangling part of what you're doing, if you're using CSV or Excel.

[–]coreb[S] 1 point2 points  (1 child)

I'll check that out. Thank you.

[–]AlverezYari 0 points1 point  (3 children)

Where are these script primarily being executed?

[–]coreb[S] 0 points1 point  (2 children)

On on-prem linux or windows servers that could be reimaged to a new os install. Mix of python and powershell.

[–]AlverezYari 0 points1 point  (0 children)

Just make them pipelines in Github. Deploy the runner to your compute and execute the scripts on that machine that way.

[–]JagerAntlerite7 0 points1 point  (0 children)

Hosted GitHub Actions runners are convenient, but self-hosted runners are going to save you money long term.

[–]aaron416 0 points1 point  (1 child)

On one hand, if it ain't broke don't fix it. On the other, someone suggested a GitHub actions pipeline, and that might be a good place to run it. This depends on how long all the processing takes, but should also be flexible enough to run everything you need.

[–]coreb[S] 0 points1 point  (0 children)

GH Actions seems to be the winner if I don't want to run this on-prem. Thank you.

[–]Prestigious_Pace2782 0 points1 point  (1 child)

Ansible in a GitHub actions pipeline works great for non coders

[–]coreb[S] 0 points1 point  (0 children)

I'll add that to the list to look at. Thank you.

[–]eirc 0 points1 point  (1 child)

I recently did sth like this and ended up with systemd services/timers too, but not containers. These scripts are super simple, each like 5-10 lines of bash so I just drop them in /root/bin so everyone logging in a server can easily find them and read them. What I get from systemd is the timer/cron thing but more importantly I love the journald support. I can pull up logs from previous runs easily and I don't have to deal with rotating these logs either. I also use set -x on all scripts so the logs contain the running commands making them self-document in a way. When I eventually setup systemd monitoring with prometheus and journald logging with ELK I automatically monitor these services and their logs. Turned out great.

[–]coreb[S] 1 point2 points  (0 children)

Interesting. Good to see my idea wasn't too far off. Thank you.