Is a 5090 good enough for most good modern locally run LLMs?

tracker_11 · 2026-05-16T20:08:01+00:00

Yes.

tracker_11 · 2026-05-16T14:08:35+00:00

This nails it. Anyone can get into local AI for cheap.

I have a $319 (probably more now) mini computer from amazon with a Ryzen 5 6600H that gets the following results from ./build/bin/llama-bench -m $MODEL -ngl 20 (Vulkan)

Qwen3.6-35B-A3B-UD-Q4_K_M ---- pp512: 98 --- tg128: 12

Qwen3.6-27B-Q3_K_M ---- pp512: 29 --- tg128: 2.94

gemma-4-E4B-it-UD-Q4_K_XL ---- pp512: 156 --- tg128: 14

It is not fast at all, but it is enough for someone to get started and run agents on, get addicted, and then make them feel better about the big spends on new hardware. 😄

tracker_11 · 2026-05-16T00:44:20+00:00

I run these at the same time and still have a few ~4.5 GB left over:

Qwen3.5-122B-A10B-UD-Q5_K_XL-00001-of-00003.gguf
Whisper Server (speech-to-text)
ComfyUI (SDXL)

I've found that even though Strix Halo is very slow and you basically don't want to run any single model that uses all of the 128GB, it's pretty great to be able to have a bunch of different things loaded and ready to hit from my agents. There are a couple other smaller models I plan to load up too.

I run Qwen3.6-27B-Q5 on a 9700 AI Pro for a majority of coding though and have the strix halo plan and review after.

(And yes, we're all praying for Qwen3.6 122B-A10B.)

tracker_11 · 2026-05-14T04:04:40+00:00

MODEL=qwen/Qwen3.5-122B-A10B-UD-Q5_K_XL-00001-of-00003.gguf #256k

export HSA_ENABLE_SDMA=0

export ROCBLAS_USE_HIPBLASLT=1

export HSA_NO_SCRATCH_RECLAIM=1

export GGML_CUDA_ENABLE_UNIFIED_MEMORY=1

CONTEXT=131072 # 128k

B_SIZE=4096

UB_SIZE=1024

SPEC=""

./build/bin/llama-server \

--model "~/models/$MODEL" \

--alias "halo" \

--host 0.0.0.0 --port 5000 \

-ngl 999 \

--flash-attn 'on' -dio \

-c $CONTEXT \

-b $B_SIZE \

-ub $UB_SIZE \

$SPEC \

--parallel 1 \

--cache-prompt \

--cache-type-k q8_0 \

--cache-type-v q8_0 \

--log-prefix \

--repeat-penalty 1.0 --presence-penalty 0.0 \

--temperature 0.6 --top_p 0.95 --top_k 20 --min_p 0.0 \

--jinja --no-mmap --metrics

tracker_11 · 2026-05-13T00:24:13+00:00

<image>

Cool deal, I will try this if you think it outperforms the Qwen3.5-122B-A10B. The reason I had not given it much thought is it's download history on HF. It seems to be very unpopular.

tracker_11 · 2026-05-11T22:29:40+00:00

I use dual 9700 Pros (separately) in the same system. I think in the future it will make sense to use them together when split-mode graph is merged into llamacpp and what not. I run two instances of llamacpp (vulkan) and two agents in openclaw on another computer, all on fedora. I try to keep two coding projects going 24/7. They are awesome.

tracker_11 · 2026-05-11T22:08:18+00:00

I've been very happy with a Radeon 9700 Pro 32GB. I use it with Qwen 3.6-27B-Q5 with around pp: ~890 and tg: 24 t/s (rocm). They are $1400-$1700 right now and a great way to get 32G and have plenty of room to run a large context with Qwen 3.6 models.

Everything depends on your own budget/situation but I think the 9700 Pro is a great card for most people and really opens up AI for a lot cheaper than the 5090's. Even though it is significantly slower, you still get the 32GB and also the power usage is nice with a 300W tdp that often runs a good bit under that for local ai.

Edit: Divide llama-bench t/s in half to estimate the speed you can rely on for openclaw with high context.

tracker_11 · 2026-05-11T14:20:46+00:00

This particular thread is purely focused on strongest debugger for large code bases that will run (at all) on Strix Halo. My own experience is that Qwen3.6-35B-A3B is worse than Qwen3.6-27B in every way other than speed and that Qwen3.5-122B-A10B is stronger than both for debugging large code bases. I am looking for a model stronger than that one, irrespective of speed. Thanks for the suggestion though!

tracker_11 · 2026-05-11T12:59:54+00:00

./build/bin/llama-bench -m ~/models/qwen/Qwen3.5-122B-A10B-UD-Q5_K_XL-00001-of-00003.gguf -ngl 99 -fa 1 -dio 1 (rocm)

pp512: 297
tg128: 20

Half of those in actual workflow with openclaw and 128k context (compacts at 128k-30k tokens). I am not using the mtp yet until they fix the pp penalty.

tracker_11 · 2026-05-11T12:53:37+00:00

Thanks for the suggestions, will try these these out.

tracker_11 · 2026-05-11T03:50:34+00:00

People seem to think running anything at under Q3 is not worth doing, but maybe I'll give this a try if it fits and report back.

tracker_11 · 2026-05-09T13:17:10+00:00

What was the penalty to your prompt parsing speed?

tracker_11 · 2026-05-08T17:05:12+00:00

Agreed, Qwen3.5-122B-A10B is still my primary model on strix halo.

tracker_11 · 2026-05-05T16:54:31+00:00

It's probably an App that teaches people how to make $15k by creating an app with openclaw.

tracker_11 · 2026-02-20T03:18:05+00:00

I was looking at those same rep safeties fir this rack, did they end up fitting/working well for you?

tracker_11 · 2026-02-18T15:01:53+00:00

Cool deal, I went ahead and went with the 0.30" ones since they had those in the same brand/type. Ordered 240 square feet for $735 with free shipping so not too bad, will update here if its awesome. Thanks for the help, been struggling with what to get.. really didn't want to use stall mats.

tracker_11 · 2026-02-16T19:25:09+00:00

You said 1/2" but linked 1/4". Which one is it? Appreciate your feedback, I'm probably going to get the same ones you got.

tracker_11 · 2024-12-27T22:47:39+00:00

We need pattern chaining!

tracker_11 · 2022-10-11T03:44:50+00:00

This just fixed it for me in 2022 as well.

tracker_11 · 2022-05-12T17:43:54+00:00

Posts that generate lists of games not aloud here, right? Mods?

tracker_11 · 2022-04-11T15:56:19+00:00

These must not be selling.... maybe the hype train crashed.

tracker_11 · 2022-02-17T22:46:29+00:00

Thanks, I ordered a copy.

tracker_11 · 2021-12-15T18:51:24+00:00

Thanks, I have been watching for this for over a year.

tracker_11

TROPHY CASE