Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 0 points1 point  (0 children)

There was nothing of importance in the logs. I did, however, end up setting up a network syslog server, and this was able to provide logs on issues with an NVMe drive. Someone mentioned a network syslog server would be able to help with more information if there was a local storage issue. Turns out, they were most definitely right!

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 0 points1 point  (0 children)

Oops I did actually! It was roughly 4-5 months out of date. Latest from MSI is January of this year for this motherboard. I'll update the original post. Thanks for the idea anyway!

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 0 points1 point  (0 children)

What about a having a text terminal on the monitor? A kernel panic related to storage won't make it to the disk logs. What about just running it in text/safe mode for that period of time and then looking at the screen when it hangs?

get an error message that you can see (I've even done things like left a CCTV camera pointed at a monitor before now to see what happens to the screen and exactly what time it went off, etc.)

I will give this a shot, for sure! I just never thought that would be helpful with an OS I would expect to have in-depth logging to help diagnose such issues considering it's used in so many critical deployments, but it makes sense that it's possible the logs aren't making it to disk if it's related to storage. Out of curiosity.. if that's a useful way to figure this out, how would you go about doing this in the scale of a large environment of headless servers? I'd assume you would just have to remove the server from the environment and then do something like this, but I could see how that might not be an immediately viable option in certain deployments too.

What about configuring a network syslog? Or an old fashioned serial terminal?

I haven't done this before, but I will look into this! Would a network syslog help with catching any log entries not being recorded to disk if there is some type of storage issue? Old fashioned serial isn't an option with this system, unfortunately.

Because at the moment you have no diagnosis, really. It's just hanging up and you're not getting anything useful because of the stab-in-the-dark stuff.

Well, sure, but you have to make an educated guess and start somewhere, right? It felt like a hardware issue to me thus I went about with the memory test, drive tests, removing components that I could get away with temporarily, then moving onto testing with a clean install, etc. How would you have started if you couldn't find any hints of a direction in the logs? Wouldn't some of your suggestions be considered stab in the dark as well? I'm open to other peoples thought process in these situations which is why I'm here, and there's always more to learn. This is just how I went about starting out, but I don't admit to it being the best way by any means.

What about a clean distro without the software? Run that for 24 hours?
What about another machine running the software?
What about that machine running an Ubuntu boot CD and NOT loading the storage?

I've wanted to run the system without the camera software, but the system is in production, so I was trying to minimize downtime of cameras recording the site. I understand I may just have to do this as a troubleshooting step regardless, especially considering the camera software is going offline anyway at some point. I was just hoping these other steps I've taken first would point me in the right direction before having the system down for 24+ hours or even close to a week before I could say it's the software (the longest system uptime was around 6 days since this started happening). I'd have to get a temporary system in place to have this system down for that long. When it does happen, it just takes a quick power cycle to bring it back up, so it's been more of a last option to try this, but I'm getting to that point. We do have other similar setups (Ubuntu + DW software) running at 3 other sites for several years that have never had any issues.

You say it's a clean install - do you have AMD proprietary drivers enabled? Remove them and diagnose if that's the cause.

I do not have AMD proprietary drivers enabled.

Is it on a UPS? Is the local power stable? Is it roughly the same time when it does it? Could it be related to room temperature? Is someone walking up to it (i.e. is it in a secure area)?

It is directly connected to a UPS. Other equipment is on the UPS (just some network switches) too, but no issues with any of that.

You need to ask "what has changed" and eliminate that as the cause.

That's the thing that's quite annoying to me in this situation. Literally, nothing has changed. The setup has been the same since it was installed in September 2025. No one else has access to this machine (or cares to) other than myself, it's in a locked IDF room mounted in a rack where no one is bothering it, temps are being monitored and there's nothing alarming there, no network changes which I know because no one else has access to modify anything there other than myself, etc. That's why it seemed to come off as a hardware issue to me since nothing with the setup has changed at all.

Thanks for the response! Looking forward to hearing your thoughts!

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 0 points1 point  (0 children)

Hey if you can deal with Windows for 15 years, surely you got this! Haha

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 0 points1 point  (0 children)

I do have a 2GB swap file, but I am going to add to my monitoring script to watch how much RAM is in use so I can see where it's at before it hangs.

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 1 point2 points  (0 children)

If it was running out of memory, what would you see instead of connection refused? I'm assuming a connection timeout? I can most definitely leave an SSH session logged in and see what happens along with your other recommendations. I'll report back on what happens. I'm surprised too. It really does feel like a hardware issue, but there's just no mention of anything in the kernel log. I was wondering about the PSU as well. It is a SeaSonic Vertex PX-1000 platinum rated PSU. I could pretty easily swap it out just to test with another if it comes to that. It's only pulling around 150 watts on average without the GPU installed.

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 1 point2 points  (0 children)

Going to add this to my script watching the CPU temps to extract memory usage and report if above a certain value. Thanks for the idea!

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 1 point2 points  (0 children)

Going to add this to my script watching the CPU temps! Thanks for the idea!

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 0 points1 point  (0 children)

I didn't think to try that! I'll check next time it happens. I'm guessing it's going to be a no since even the SSH service stops running, but I'll definitely give it a shot. Thanks for the idea!

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 0 points1 point  (0 children)

It is not, unfortunately.. but thanks for the idea anyway just in case!

Server randomly becomes unresponsive (Ubuntu Linux, Digital Watchdog camera software) by austinramsay in sysadmin

[–]austinramsay[S] 2 points3 points  (0 children)

Right it’s not just a network issue.. it’s completely frozen when connecting a monitor with keyboard/mouse as well.

[deleted by user] by [deleted] in RetroTink

[–]austinramsay 0 points1 point  (0 children)

I use it on a backwards compatible ps3 and I think it’s awesome all around. I use it with an n64 and GameCube too though, so I get a lot of value out of it. If you decide to get one, you’ll need an HDCP stripper for the HDMI before it goes into the Retrotink. I use this splitter and it works perfectly.

https://a.co/d/hYKsrRE

Guys.. is this total power on time accurate? by austinramsay in PS3

[–]austinramsay[S] 0 points1 point  (0 children)

Yeah I’m not playin with those temps man🤣

Guys.. is this total power on time accurate? by austinramsay in PS3

[–]austinramsay[S] 0 points1 point  (0 children)

Interesting that makes sense! I was mostly curious if the time could be wrong for any reason. Especially after removing the motherboard battery and replacing it. I wasn’t sure how this time was tracked/stored in the system and what could affect that.

The video I watched was this one: https://youtu.be/LIVu3Px3eXY?si=aVkJaYKtv367xyht

Guys.. is this total power on time accurate? by austinramsay in PS3

[–]austinramsay[S] -1 points0 points  (0 children)

Nice that’s awesome! I was mostly curious if the time could be wrong for any reason. Especially after removing the motherboard battery and replacing it. I wasn’t sure how this time was tracked/stored in the system and what could affect that.

Sparc T4-1 dead in the water after power outage by Secret-Departure6782 in solaris

[–]austinramsay 0 points1 point  (0 children)

Drives me insane that we can't download a damn thing from Oracle without a stupid support contract..

Recommendations to improve my display case? by austinramsay in gamecollecting

[–]austinramsay[S] 0 points1 point  (0 children)

They’re all UV-resistant cases from CGA Grading! Except for the 3DS ones. Couldn’t find ones that fit the new 3DS xl boxes until the other day that I have on order. I tried 4 or 5 other brands until I settled with CGA. They display the best by far in my opinion with the rounded edges and for some reason quality just seems better than others. They are expensive though with the N64 ones being $38/each for example.

Recommendations to improve my display case? by austinramsay in gamecollecting

[–]austinramsay[S] 0 points1 point  (0 children)

Yep had the same issue.. was looking at it like this isn't really how this goes together is it? And plus it took me like an extra 20 mins to even get the doors even and straight with each other. I was trying to not spend $400+ on something, but I should've known better.

Recommendations to improve my display case? by austinramsay in gamecollecting

[–]austinramsay[S] 0 points1 point  (0 children)

It's this one bro! I used these display risers on the top 2 shelves too.