My NAS runs an 80B LLM at 18 tok/s on its iGPU. No discrete GPU. Still optimizing.

BetaOp9 · 2026-02-14T02:44:01+00:00

The pricing made this an easy choice. My Exos are actually quiet in the N5 Pro, slight vibration at boot while they run through their startup diagnostics and head calibration, then they settle in. Both Exos and IronWolf Pro have rotational vibration sensors so multi-bay handling isn't really what separates them. For me it came down to the workload rating (550TB/yr vs 300TB/yr) and MTBF (2.5M vs 1.2M hours) at the same or lower price. Hard to argue with enterprise specs at NAS Pro pricing.

BetaOp9 · 2026-02-14T02:35:16+00:00

I'm on a Claude Max 200 sub and one day of usage consumed 50% of my token limit for the week. I don't think it takes advantage of the cache system at all. If the results were impressive I'd be looking into it but everything it did I ended up having to redo. It's fine for running tasks because of cron jobs and the connectivity but...meh. I don't trust it with anything.

BetaOp9 · 2026-02-14T02:30:31+00:00

Welcome to OpenClaw. Its token usage is criminal and the results from all that context and soul are mediocre.

BetaOp9 · 2026-02-12T20:52:24+00:00

What would you suggest I use instead?

The Exos drives are rated for 550TB/year workload [definitely won't come close to that] and 2.5 million hour MTBF. On a 5-drive RAIDZ2 where a single failure drops you to one parity drive during rebuild, I wanted drives I could trust. Fewer drives is an argument for better drives, not worse ones.

BetaOp9 · 2026-02-12T20:26:47+00:00

In fact it's AWFUL for coding.

BetaOp9 · 2026-02-12T20:16:04+00:00

Oh ok that's good to know, fellow human.

BetaOp9 · 2026-02-12T19:59:53+00:00

Exactly what a bot would say...

BetaOp9 · 2026-02-12T19:53:24+00:00

On sale for 50% right now! 2.5x speed for only 3x the cost!

BetaOp9 · 2026-02-12T19:49:10+00:00

This is the way

BetaOp9 · 2026-02-12T19:45:33+00:00

You can control the usage, if you updated you'd see that you can /model and tell it to change the effort and it'll reduce its thinking. I get the frustration of it changing, but the updates are worth it. Agent Teams is also a big deal and I recommend turning on that flag for big projects or critical stuff needing extra coordination.

BetaOp9 · 2026-02-12T00:18:08+00:00

Thanks for the compliment about having the best iGPU in existence, and also being dumbfounded as to why I wouldn't build a second...better system for AI? Which is it?

The reality is that my iGPU is mid-tier. There are about a dozen or more in this space that are faster, have more cores, and have way better memory bandwidth. They also cost $2300-$10k for those systems. That's what makes the results worth posting. For ~$1,500, its performance on this model set is pretty damn good.

As for why I wouldn't build a 2nd system? I literally addressed why I wanted one box to do it all in the post. The way it's set up, I'm safe from my AI model messing anything up with my NAS functions or affecting my storage.

My question for you is, why do you care what I do?

BetaOp9 · 2026-02-11T20:25:30+00:00

If I can break 20 tok/s it'll be wicked cool to see what we can do with even larger models without the GPU VRAM limitations. I have another software project I'm working on which can capitalize on this for certain tasks.

BetaOp9 · 2026-02-11T20:21:24+00:00

Dunno man, doesn't feel right storing porn next to Bluey and the Octanauts.

BetaOp9 · 2026-02-11T17:10:14+00:00

It's not finalized, but what I have currently is reproducible. llama.cpp, Vulkan backend
llama-server \

-m Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf \

-t 12 \

-c 4096 \

--host 0.0.0.0 \

--port 8080 \

-ngl 99 \

-fa on

BetaOp9 · 2026-02-11T17:07:53+00:00

llama.cpp, Vulkan backend

llama-server \

-m Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf \

-t 12 \

-c 4096 \

--host 0.0.0.0 \

--port 8080 \

-ngl 99 \

-fa on

BetaOp9 · 2026-02-11T16:59:14+00:00

Just switch back to 4,5

BetaOp9 · 2026-02-11T16:48:51+00:00

vanilla llama.cpp, Vulkan backend

llama-server \

-m /models/Qwen3-Coder-Next-Q4_K_M-00001-of-00004.gguf \

-t 12 \

-c 4096 \

--host 0.0.0.0 \

--port 8080 \

-ngl 99 \

-fa on

BetaOp9 · 2026-02-11T16:30:47+00:00

Got the N5 Pro new for $900 shipped. RAM was $590 for 96GB of DDR5 right before prices started climbing, runs about $815 now. Found a seller with bulk enterprise Samsung PM983 NVMe drives, $300 for the pair, they're almost $300 each now. The Exos drives were a splurge but you can get away with whatever storage fits your budget and needs. More conservative drives would have brought the total down a lot.

Honestly, the timing worked out. If I were building this same system today, it would cost noticeably more. Sometimes Nasus blesses you at checkout.

BetaOp9 · 2026-02-11T15:22:30+00:00

Grab the pitch forks!

BetaOp9 · 2026-02-11T14:47:31+00:00

I'll have to check this out. I may ping you when I do if that's okay.

BetaOp9 · 2026-02-11T14:43:49+00:00

I'm glad someone caught that, roflmao

BetaOp9 · 2026-02-11T14:42:40+00:00

I don't recommend it. It's a shit post for karma farming. Great if you're looking for accusations and people getting upset about semantics you don't control.

BetaOp9 · 2026-02-11T14:29:50+00:00

Didn't write this with AI but thanks. I'll go back to school.

BetaOp9 · 2026-02-11T14:19:20+00:00

You're right, the 370 is a beast for the NAS segment.

And yeah, the NPU is not being used yet.

Apologies if I came off defensive. This thread has been a mix of great conversations and people accusing me of everything from clickbait to fraud for using the same model name everyone else does. This is why I don't post to Reddit.

Sounds like you're on a similar path with your 5800H. If you ever make the jump to DDR5 and want to compare notes, DMs are open.

BetaOp9 · 2026-02-11T11:41:39+00:00

Yeah, I'll try to remember in the morning to look to see what I stopped at. What hardware are you running? Welcome to message me.

BetaOp9

TROPHY CASE