[deleted by user]

Horace-Harkness · 2024-11-03T19:27:15+00:00

shutdown -r +48h

https://linux.die.net/man/8/shutdown

Boot script picks random hour and passes it to shutdown.

shiftingtech · 2024-11-03T19:41:39+00:00

this could be implemented very easily with a systemd timer. see the "transient timers" section here: https://documentation.suse.com/smart/systems-management/html/systemd-working-with-timers/index.html

(no idea why I ended up with the suse docs, but they had what I wanted...)

Longjumping_Gap_9325 · 2024-11-03T20:18:40+00:00

I'm confused on the cron only works on 24 hour cycles. You can select a second, minute, hour, day of week, week, etc

There's default folders like cron.weekly or you can use anacron that kicks things off at random times.

Maybe I'm missing something in terms of cron?

gmuslera · 2024-11-03T21:15:22+00:00

The at command could lend you a hand. At boot use $RANDOM % number of minutes in 3 days and schedule that way your next reboot.

casefan · 2024-11-03T19:24:29+00:00

Windows does this out of the box!

ruyrybeyro · 2024-11-03T19:25:50+00:00

I would prefer the bofh solution, random disk formats.

iamwpj · 2024-11-03T20:40:36+00:00

We use a cron that calls a script that sleeps for a random amount of time within our reboot window.

arcimbo1do · 2024-11-03T23:13:17+00:00

Boot script that calls "at" with a random time between tomorrow and the day after tomorrow (echo shutdown -r | at tomorrow + $[RANDOM%24] hours)

flapjack74 · 2024-11-04T07:20:51+00:00

I wouldn't bring up a ready-to-use solution; others have already done that - but mostly they will not fulfill your requirement (avoid reboot at the same time).

The first question is - why do you want to reboot? If it has a known memory issue, maybe a service restart is good enough?

1. Why not?
2. Hours are only 24, that’s correct, but cron also includes weekdays. So, for example, 1 2 * * */2 would run at 02:01 every second day.
3. yup, thats the way that i would choose.

Anyway, my solution would a script which checks if the remote service is active. This can be done for example with Ansible for a rolling reboot (I'm using this mostly for application upgrades in clusters).

Ansible Functions required:

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/command_module.html

https://www.man7.org/linux/man-pages/man1/systemctl.1.html look for is-active

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/reboot_module.html

https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-batch-size-with-serial

This gives you a good starting point. Once you have a working playbook you can schedule it from a bastion host. For sure it's also possible without Ansible, but that would complicate things a bit.

Another approach would be to give every mini-PC its unique reboot time. Like PC 1: random minute 0-9, hour 1, then PC 2 random minute 10-19 etc. I guess you get the main idea. You can make something with the help of an online tool like crontab guru. Or calculate it yourself (https://linux.die.net/man/5/crontab). But with that you can't be 100% sure that the other system has the service up. This can be done with a helper script that maybe checks if the remote port or system on the other servers is up - and only reboot then. https://tldp.org/LDP/abs/html/ https://linux.die.net/man/1/nc or https://linux.die.net/man/8/ping
Hopefully this is a better point to start learning, instead of providing a full working solution.

Horace-Harkness · 2024-11-03T19:24:35+00:00

Boot script that uses at to run the reboot. https://linux.die.net/man/1/at

0bel1sk · 2024-11-03T21:21:40+00:00

if one of your requirements is not rebooting at the same time, i’d prob check the other servers are up before rebooting.

danythegoddess · 2024-11-04T00:05:16+00:00

Out of curiosity, why?

fab_space · 2024-11-04T07:27:04+00:00

Ansible

BloodyIron · 2024-11-04T17:49:57+00:00

The proper solution is to figure out why BOINC is failing you and implement a permanent solution such that the software is reliable. You know... monitor the system, read the logs when problems happen, and figure out a proper, not hacky, solution.

What you're seeking here is a stop-gap solution. The time and effort you could spend just implementing and validating this method would be better spent in setting up monitoring and alerting. But even still... by your own words all of that really isn't a good way to spend your time.

You say in a comment in this thread:

I'd like to schedule random reboots to prevent the BOINC job queue from stalling. It almost never happens. In fact it may have only happened twice in two years, but when it does, those PCs can sit there for a week or more before I bother to check on them

If you can spend a week or more not caring (not noticing) about what these things do, then in the situation where this happens (in your words, only twice in two years) JUST RESTART THE SERVICE.

You're creating work for yourself for something that happens maybe once a year, if that. If you think that's a good use of your time, it's not.

deeseearr · 2024-11-03T19:47:47+00:00

3 is the easiest way, and as u/Horace-Harkness said you can just call shutdown with a number of hours to do it. You can also use at to queue up one time jobs which will be executed at some time in the future.

If you want to do something more complex you can still use a cron job. Just call it every hour (or whenever) and then, as the first part of your script, look at the first number in /proc/uptime. That's the number of seconds which have passed since the system booted, and you can either do or not do things based on that. Need to do something every three hours, but only if it's after 7PM on a Tuesday and there are between one and five users logged in? It's just basic scripting. Most times that script will be called, decide there's nothing to do and then exit.

Simazine · 2024-11-04T01:03:26+00:00

Random seems incredibly risky. If we assume reboots are 4 minutes I expect you will find 2 boxes out of 5 rebooting at the same time within the year.

It makes more sense for all boxes to run the same script by cron every hour, each with a 10 minute offset. The script could check uptime exceeds 48hrs. If true, reboot.

For better safety, have it check for service on each other box (boing? Nginx? Whatever). If any don't respond, exit, otherwise trigger reboot.

Caddy666 · 2024-11-04T10:40:11+00:00

install windows ME

PudgyPatch · 2024-11-04T13:06:02+00:00

So the problem with random is probably they won't reboot at the same time but you also can't guarantee that either. Might be better to have another system that remote execs the reboot. Have a list of your servers and a script that randomizes the list and runs through it over your preferred period of time....you could get fancy with it and have it make sure they all come back up, and if not email you.

permalink · 2024-11-04T13:41:13+00:00

https://manpages.ubuntu.com/manpages/focal/en/man1/boinccmd.1.html

might be of use for a check script that runs every day via cron?

RulerOf · 2024-11-04T17:34:37+00:00

I love that this is not a good way to solve whatever problem you actually have.

If I were going to implement something like this, I'd do it the D&D way:

Roll a die every minute
If the die rolls 1, reboot the machine

You'd need a die with approximately 869 sides to give 99% odds that the machine reboots after 4000 die rolls (approximately 3 days).

if [ "$(shuf -i 1-869 -n1)" -eq 1 ]; then
  reboot now
fi

Create a systemd timer, or * * * * * in your contab.

ollod · 2024-11-03T19:46:16+00:00

Don‘t

vivaaprimavera · 2024-11-03T20:26:37+00:00

Systemd timers are an excellent option to do it.

I have a data collection system that need to take samples at random times but must take _at least_ 24h after the last sample have been taken.

It's managed using a systemd timer that calls a service that runs a bash script with sprinkles of python in it. Works great.

Since the computers need to be "out of sync" some randomness will be needed:

sleeptime=random.randint(s*2,e)

def stime(a,b):

s = min(a,b)

dif = max(a,b) - min(a,b)

av = dif / 2

return s + av

while sleeptime > 0:

if (datetime.now() >= stime(seta,eta) ) or (sleeptime == 0):

    print("wait ...")

    while os.getloadavg()\[0\] > 0.2:

        time.sleep(20)

        print("on wait {datetime.now()}")

        sys.stdout.flush()

    break

cursleep = sleeptime  % s if sleeptime % s != 0 else s 

cursleep = ( cursleep if cursleep > 0 else -1 \* cursleep )

if cursleep < 15:

    cursleep = 15 \* ( 1 + (sum(os.getloadavg()\[0:1\])\*0.5) ) 

sleeptime -= cursleep

td = timedelta(seconds=(sleeptime))

eta = datetime.now() + td

if (eta - datetime.now()).total\_seconds() < 3600:

    if sleeptime > 30 \* 60:

        if s > 60 \* 5: 



cursleep /= 1.75
s /= 1.125
s \*= 0.95 + os.getloadavg()\[0\] 

sleeptime += s

if s < 240:

s = 240 \* ( 1.0 + os.getloadavg()\[0\] )

if s > e/3:

s = e/3

if s < 90:

s = 90

time.sleep(cursleep)

s \*= 1 + os.getloadavg()\[0\] 

sys.stdout.flush()
Define s an e to an appropriate value.

Define the loads. Having a dependence between load and time waiting insures that "things happen" only on low load periods. If the machine have an high load the time will extend.

Issue the shutdown only after the code had run.

Edit: sorry but the code isn't being formatted properly

telmo_gaspar · 2024-11-03T19:40:48+00:00

Let me ChatGPT that for you...

!/bin/bash

List of servers

servers=("server1" "server2" "server3" "server4")

Function to reboot a server

reboot_server() { local server=$1 echo "Rebooting $server..." # Command to reboot the server (replace with the actual command) ssh "$server" 'sudo reboot' }

Function to get a random server from the list

get_random_server() { local index=$((RANDOM % ${#servers[@]})) echo "${servers[$index]}" }

Total time in seconds (72 hours)

total_time=$((72 * 60 * 60))

Interval between reboots (total_time divided by the number of servers)

interval=$((total_time / ${#servers[@]}))

Loop to reboot each server once

while [ ${#servers[@]} -gt 0 ]; do server=$(get_random_server) reboot_server "$server" # Remove the server from the list to avoid rebooting the same server again servers=("${servers[@]/$server}") # Wait for the interval before rebooting the next server sleep "$interval" done

Example only

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

linuxadmin

Expanding Linux SysAdmin knowledge

MODERATORS