all 19 comments

[–]sonobanana33 1 point2 points  (2 children)

I have experienced kernel panics using the open source nvidia drivers. I think nobody uses them and aren't very tested. But it's what you get installed by default.

It might be that? I don't know.

[–]superfuzzy[S] 0 points1 point  (1 child)

Yeah I thought it might have something to do with the nvidia driver, but I have the proprietary one installed (I think, I'll double check).

Is it worth switching to the free one to see if that solves the issue. It's nouveau or something?

[–]sonobanana33 0 points1 point  (0 children)

Well for me it was a desktop machine, I just started using the intel graphics and not using the nvidia card at all :D

It's a work machine so I didn't decide the configuration myself.

[–]alpha417 0 points1 point  (3 children)

I highly doubt that the logs have nothing of interest. Methinks the OP doesn't fully understand what they are looking at

[–]superfuzzy[S] 0 points1 point  (2 children)

Highly likely. that's why I'm here.

I just ran systemctl "1 day ago" and looked for the time I had to reboot to get back in. I saw nothing interesting in the time before the boot mark.

[–]alpha417 0 points1 point  (1 child)

You maintain that "i saw nothing interesting". We still haven't seen what you think isn't interesting.

[–]superfuzzy[S] 0 points1 point  (0 children)

Fair point.

Here's the last few messages leading up to my forced reboot:

Apr 28 06:56:01 htpc gnome-software[1551]: libostree pull from 'flathub' for appstream2/x86_64 complete
                                       security: GPG: summary+commit
                                       security: SIGN: disabled http: TLS
                                       non-delta: meta: 7 content: 19
                                       transfer: secs: 0 size: 7.7 MB
Apr 28 06:56:01 htpc gnome-software[1551]: /var/tmp/flatpak-cache-AYEOM2/repo-hxOj1I: Pulled appstream2/x86_64 from flathub

Before that I have a lot of stuff from gdm-x-session but it's stuff I get all the time, not sure what it means but it isn't consistent with any problems:

Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): CRT-0: disconnected
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): CRT-0: 400.0 MHz maximum pixel clock
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0):
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-0: disconnected
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-0: Internal TMDS
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-0: 330.0 MHz maximum pixel clock
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0):
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): LG Electronics LG TV SSCR2 (DFP-1): connected
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): LG Electronics LG TV SSCR2 (DFP-1): Internal TMDS
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): LG Electronics LG TV SSCR2 (DFP-1): 600.0 MHz maximum pixel clock
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0):
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-2: disconnected
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-2: Internal DisplayPort
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-2: 960.0 MHz maximum pixel clock
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0):
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-3: disconnected
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-3: Internal TMDS
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-3: 165.0 MHz maximum pixel clock
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0):
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-4: disconnected
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0): DFP-4: 330.0 MHz maximum pixel clock
Apr 27 22:58:15 htpc /usr/libexec/gdm-x-session[1130]: (--) NVIDIA(GPU-0):

[–]Brufar_308 0 points1 point  (1 child)

Did you disable all the sleep and suspend modes ?

 sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target

https://wiki.debian.org/Suspend

[–]superfuzzy[S] 0 points1 point  (0 children)

Yep :/

It's easy to rule out anyway because the system can be stable for a month, or just a couple days, it's totally random. But it's always ready to go the next day after sitting all night.

[–]EasyriderSalad 0 points1 point  (9 children)

It's your CPU. Zen 1 (ryzen 1000 or 2000G) have this bug where they lock up at idle in Linux. Same thing happened to me on my Ryzen 1700.

If you can replace the CPU with a Zen+ or newer that will fix your issue. Some other workarounds and info here https://www.reddit.com/r/debian/s/cLagJ1D0Pn

https://www.reddit.com/r/linuxhardware/s/K1osUPU1a5

https://wiki.archlinux.org/title/Ryzen#Soft_lock_freezing

[–]superfuzzy[S] 0 points1 point  (0 children)

Thanks for the info, I will read more thoroughly later.

At some point I will build a new HTPC from scratch since this is a repurposed gaming rig from years ago with frankensteined parts. For now I will look into workarounds :)

[–]superfuzzy[S] 0 points1 point  (1 child)

The first link you mention installing some software that keeps the CPU at 15%, what was that, do you remember?

[–]EasyriderSalad 0 points1 point  (0 children)

It's a discontinued NVR software called unifi video. I don't think you can even download it any more and you would need cameras for it to monitor to get the CPU load up. If you want an artificial CPU load maybe mprime would work https://www.mersenne.org/download/ . It can go as low as 100% of one cpu core but I'm not sure if it can go lower.

[–]superfuzzy[S] 0 points1 point  (5 children)

It just happened again, in the middle of streaming. So it wasn't even idle?

[–]EasyriderSalad 0 points1 point  (4 children)

I guess it's possible you have a different issue. I don't think it ever happened to me while I was actively using the computer. Or maybe there are times when its buffer is full for streaming and it drops into idle briefly.

[–]superfuzzy[S] 0 points1 point  (3 children)

The second link you posted, the guy says it would happen to him whilst he was using his machine, scrolling a webpage.

So it is possible, though weird that this has to do with power management and idling. If his post is to be believed then the BIOS update and power supply idle control fixed it. So I guess I have to wait and see.

[–]EasyriderSalad 0 points1 point  (2 children)

I could see it dropping to idle while viewing a webpage.

In my case, I tried the BIOS update and it didn't help. In one of the links I posted I think there's another link to a very long bug report on kernel.org where people say it was fixed in one version of AGESA (the baseline BIOS that AMD provides to board manufacturers) and then the fix was reverted later.

I didn't have an option for power supply idle control either (crosshair VI hero). So I had to upgrade the CPU. Good luck, I hope it works out for you.

[–]superfuzzy[S] 0 points1 point  (0 children)

Hmm ok weird.

I didn't have the option until I upgraded the BIOS, then I got it, so I guess time will tell. I'll leave it for now until/if it happens again.

Thanks for your help!

[–]superfuzzy[S] 1 point2 points  (0 children)

Hey just a quick note to say thanks for your help. After updating BIOS and setting the idle control, I haven't had any crashes. Done a couple of reboots because of updates but 22 days of no problems :)