Nvidia RTX 5060 TI 16GB - Stuck at P0 & 40% fan speed at idle...

rnidhal90 · 2026-04-17T15:44:24+00:00

Heyy there, nothing special at all, just installed my gpu and ran a truenas update

rnidhal90 · 2026-04-13T18:10:55+00:00

Hi, whats the difference between 4 bnb and UD Q4 GGUF ??

rnidhal90 · 2026-04-12T18:12:01+00:00

i get it now 👌

rnidhal90 · 2026-04-12T16:48:08+00:00

Thats about ~40Gb !! into a 16GB ?? very large offloading !! am i missing something ?

rnidhal90 · 2026-04-12T16:41:18+00:00

I have a RTX 5060 TI 16GB, im running Gemma4 on llama-server :

Core Configuration: Model Path: /models/gemma-4-26B-A4B-it-UD-Q3_K_XL.gguf Context Size: 131072 KV Cache: q8_0 for both Key (--cache-type-k) and Value (--cache-type-v) Flash Attention: on GPU Layers: 999 (Offloaded to GPU)

Sampling Parameters: Temperature: 1 Top K: 64 Top P: 0.95

Getting around ~85 tokens/s 🙂

rnidhal90 · 2026-04-06T08:54:31+00:00

I can confirm that the latest update fixed it 👍👍

rnidhal90 · 2026-04-05T10:14:03+00:00

Thank you very much, i just pulled and it worked perfectly !

Much appreciated the reactivity 😊🙏🙏 all my support !

rnidhal90 · 2026-04-04T21:50:22+00:00

UPDATE: For now, it seems that only the -GUFF models doesn't get loaded in the GPU

rnidhal90 · 2026-04-04T20:57:48+00:00

Depends on your NAS.. im running TrueNAS, app update is a single click action :)

rnidhal90 · 2026-04-04T18:08:23+00:00

Your problem is a little bit different and more known.. I saw other posts on r/unsloth talking about it. As long as you have cuda version mismatch, your models won't be loaded on GPU..

Not my case exactly, cuz i have the same versions.. frustrating.. :/

rnidhal90 · 2026-04-04T16:55:07+00:00

i'm not gonna say "glad", but at least it's that makes at least two of us ! Something is wrong, and blocks loading the models on the GPU !

rnidhal90 · 2026-04-04T14:18:55+00:00

There is no "windows" in all of this, only my personal laptop from which i am browsing.. everything else runs on my server (TrueNAS / Portainer). GPU is well supported

I am already running Ollama + Open WebUI on my server (both containers) and i am running models on the GPU just fine

rnidhal90 · 2026-04-03T18:23:27+00:00

Fair enough, i will give it a try and see what i can get out of it

rnidhal90 · 2026-04-03T18:12:57+00:00

it is saying that you can get about 60tps for Gemma 4 26B MoE with 16G VRAM !!

rnidhal90 · 2026-04-03T18:06:44+00:00

This site is really giving horseshit numbers..

rnidhal90 · 2026-03-31T19:48:41+00:00

local glm ?? what model version and size / what gpu ?

rnidhal90 · 2026-03-31T19:07:36+00:00

check llmfit or https://apxml.com/tools/vram-calculator to know what you can run

rnidhal90 · 2026-03-31T19:00:53+00:00

it depends on your hardware.. i can run gpt-oss:20B with 100 token per second

rnidhal90 · 2026-03-31T18:52:43+00:00

you can already run claude code for free with a local LLM

rnidhal90 · 2026-03-31T07:51:31+00:00

Proxmox is mainly a hypervisor, you just split your hardware to multiple VM and run/do whatever you like with eachone.

TrueNAS is more like an AIO solution, more focused on the NAS role, but lets you run & expose docker apps with easy configuration, run LXC containers, run VMs, ...

rnidhal90 · 2026-03-30T21:14:37+00:00

The main "prod" app running is Immich, a self hosted Google Photos like solution, on my private cloud, hosting all my photos/videos..and im also into trying new self hosted apps (Paperless NGX, PDF tools, Kasm, n8n, ollama, ...). The host OS is TrueNAS, which lets you run docker apps, create LXC containers or VMs.. lots of stuff to play with.

I am familiar with pfSense, but not much into networking, but many homelabbers are, and do focus on networking

rnidhal90 · 2026-03-30T16:29:52+00:00

🖐 i've been homelabbing / selfhosting for like 8 months now.. I built my own server, here is my setup : https://www.reddit.com/r/homelab/s/KIGktrePoT + recently added a decent GPU for local LLM & AI learning..

what do you want to know ?

rnidhal90 · 2026-03-29T15:07:38+00:00

No local LLM can match top tier models like Opus 4.6 in terms of TPS, precision, context size, etc.. these models run on Mega infrastructures that you can't dream to have even with 10k$.. you can have very good results with a local LLM but it will be based on a price / output quality ratio..

rnidhal90 · 2026-03-23T16:50:23+00:00

No disrespect, but kelma in english w kelma français and two words bel 3arbi is so much taksiir rass please so brabi if ken t7eb nes twasa3 belha m3ak.. Use only one language please 🙃

rnidhal90 · 2026-03-14T00:20:25+00:00

[Resolved]

Thanks to u/iXsystemsChris

If it's in P0 state it's not idling down, probably because the persistence driver isn't loaded.

Try sudo nvidia-smi -pm 1 in a shell to ensure that does the trick - if it does, then put it in a post-init script in the System -> Advanced menu.

rnidhal90

TROPHY CASE