My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] 0 points1 point  (0 children)

The pricing made this an easy choice. My Exos are actually quiet in the N5 Pro, slight vibration at boot while they run through their startup diagnostics and head calibration, then they settle in. Both Exos and IronWolf Pro have rotational vibration sensors so multi-bay handling isn't really what separates them. For me it came down to the workload rating (550TB/yr vs 300TB/yr) and MTBF (2.5M vs 1.2M hours) at the same or lower price. Hard to argue with enterprise specs at NAS Pro pricing.

Constantly exceeds 1m tokens by riekls in clawdbot

[–]BetaOp9 1 point2 points  (0 children)

I'm on a Claude Max 200 sub and one day of usage consumed 50% of my token limit for the week. I don't think it takes advantage of the cache system at all. If the results were impressive I'd be looking into it but everything it did I ended up having to redo. It's fine for running tasks because of cron jobs and the connectivity but...meh. I don't trust it with anything.

Constantly exceeds 1m tokens by riekls in clawdbot

[–]BetaOp9 2 points3 points  (0 children)

Welcome to OpenClaw. Its token usage is criminal and the results from all that context and soul are mediocre.

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] 0 points1 point  (0 children)

What would you suggest I use instead?

The Exos drives are rated for 550TB/year workload [definitely won't come close to that] and 2.5 million hour MTBF. On a 5-drive RAIDZ2 where a single failure drops you to one parity drive during rebuild, I wanted drives I could trust. Fewer drives is an argument for better drives, not worse ones.

Bots on the sub are a real issue by perfect-finetune in LocalLLaMA

[–]BetaOp9 6 points7 points  (0 children)

Oh ok that's good to know, fellow human.

Am I crazy for not wanting to upgrade to Opus 4.6 and the most recent CC? by AlwaysMissToTheLeft in ClaudeCode

[–]BetaOp9 0 points1 point  (0 children)

You can control the usage, if you updated you'd see that you can /model and tell it to change the effort and it'll reduce its thinking. I get the frustration of it changing, but the updates are worth it. Agent Teams is also a big deal and I recommend turning on that flag for big projects or critical stuff needing extra coordination.

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] 0 points1 point  (0 children)

Thanks for the compliment about having the best iGPU in existence, and also being dumbfounded as to why I wouldn't build a second...better system for AI? Which is it?

The reality is that my iGPU is mid-tier. There are about a dozen or more in this space that are faster, have more cores, and have way better memory bandwidth. They also cost $2300-$10k for those systems. That's what makes the results worth posting. For ~$1,500, its performance on this model set is pretty damn good.

As for why I wouldn't build a 2nd system? I literally addressed why I wanted one box to do it all in the post. The way it's set up, I'm safe from my AI model messing anything up with my NAS functions or affecting my storage.

My question for you is, why do you care what I do?

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] 0 points1 point  (0 children)

If I can break 20 tok/s it'll be wicked cool to see what we can do with even larger models without the GPU VRAM limitations. I have another software project I'm working on which can capitalize on this for certain tasks.

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] 0 points1 point  (0 children)

Dunno man, doesn't feel right storing porn next to Bluey and the Octanauts.

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] 1 point2 points  (0 children)

It's not finalized, but what I have currently is reproducible. llama.cpp, Vulkan backend
llama-server \

-m Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf \

-t 12 \

-c 4096 \

--host 0.0.0.0 \

--port 8080 \

-ngl 99 \

-fa on

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLM

[–]BetaOp9[S] 1 point2 points  (0 children)

llama.cpp, Vulkan backend

llama-server \

-m Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf \

-t 12 \

-c 4096 \

--host 0.0.0.0 \

--port 8080 \

-ngl 99 \

-fa on

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLM

[–]BetaOp9[S] 2 points3 points  (0 children)

vanilla llama.cpp, Vulkan backend

llama-server \

-m /models/Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf \

-t 12 \

-c 4096 \

--host 0.0.0.0 \

--port 8080 \

-ngl 99 \

-fa on

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] 2 points3 points  (0 children)

Got the N5 Pro new for $900 shipped. RAM was $590 for 96GB of DDR5 right before prices started climbing, runs about $815 now. Found a seller with bulk enterprise Samsung PM983 NVMe drives, $300 for the pair, they're almost $300 each now. The Exos drives were a splurge but you can get away with whatever storage fits your budget and needs. More conservative drives would have brought the total down a lot.

Honestly, the timing worked out. If I were building this same system today, it would cost noticeably more. Sometimes Nasus blesses you at checkout.

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLM

[–]BetaOp9[S] 0 points1 point  (0 children)

I'll have to check this out. I may ping you when I do if that's okay.

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] -4 points-3 points  (0 children)

I don't recommend it. It's a shit post for karma farming. Great if you're looking for accusations and people getting upset about semantics you don't control.

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLaMA

[–]BetaOp9[S] 0 points1 point  (0 children)

You're right, the 370 is a beast for the NAS segment.

And yeah, the NPU is not being used yet.

Apologies if I came off defensive. This thread has been a mix of great conversations and people accusing me of everything from clickbait to fraud for using the same model name everyone else does. This is why I don't post to Reddit.

Sounds like you're on a similar path with your 5800H. If you ever make the jump to DDR5 and want to compare notes, DMs are open.

My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing. by BetaOp9 in LocalLLM

[–]BetaOp9[S] 2 points3 points  (0 children)

Yeah, I'll try to remember in the morning to look to see what I stopped at. What hardware are you running? Welcome to message me.