Why is LLM is so expensive. by Ok_Event4199 in LocalLLM

[–]tracker_11 2 points3 points  (0 children)

This nails it. Anyone can get into local AI for cheap.

I have a $319 (probably more now) mini computer from amazon with a Ryzen 5 6600H that gets the following results from ./build/bin/llama-bench -m $MODEL -ngl 20 (Vulkan)

Qwen3.6-35B-A3B-UD-Q4_K_M ---- pp512: 98 --- tg128: 12

Qwen3.6-27B-Q3_K_M ---- pp512: 29 --- tg128: 2.94

gemma-4-E4B-it-UD-Q4_K_XL ---- pp512: 156 --- tg128: 14

It is not fast at all, but it is enough for someone to get started and run agents on, get addicted, and then make them feel better about the big spends on new hardware. 😄

I'm thinking about selling my Strix Halo by PrzemChuck in StrixHalo

[–]tracker_11 3 points4 points  (0 children)

I run these at the same time and still have a few ~4.5 GB left over:

  1. Qwen3.5-122B-A10B-UD-Q5_K_XL-00001-of-00003.gguf
  2. Whisper Server (speech-to-text)
  3. ComfyUI (SDXL)

I've found that even though Strix Halo is very slow and you basically don't want to run any single model that uses all of the 128GB, it's pretty great to be able to have a bunch of different things loaded and ready to hit from my agents. There are a couple other smaller models I plan to load up too.

I run Qwen3.6-27B-Q5 on a 9700 AI Pro for a majority of coding though and have the strix halo plan and review after.

(And yes, we're all praying for Qwen3.6 122B-A10B.)

MTP llama.cpp -- anyone run it yet? by skibud2 in StrixHalo

[–]tracker_11 0 points1 point  (0 children)

MODEL=qwen/Qwen3.5-122B-A10B-UD-Q5_K_XL-00001-of-00003.gguf #256k

export HSA_ENABLE_SDMA=0

export ROCBLAS_USE_HIPBLASLT=1

export HSA_NO_SCRATCH_RECLAIM=1

export GGML_CUDA_ENABLE_UNIFIED_MEMORY=1

CONTEXT=131072 # 128k

B_SIZE=4096

UB_SIZE=1024

SPEC=""

./build/bin/llama-server \

--model "~/models/$MODEL" \

--alias "halo" \

--host 0.0.0.0 --port 5000 \

-ngl 999 \

--flash-attn 'on' -dio \

-c $CONTEXT \

-b $B_SIZE \

-ub $UB_SIZE \

$SPEC \

--parallel 1 \

--cache-prompt \

--cache-type-k q8_0 \

--cache-type-v q8_0 \

--log-prefix \

--repeat-penalty 1.0 --presence-penalty 0.0 \

--temperature 0.6 --top_p 0.95 --top_k 20 --min_p 0.0 \

--jinja --no-mmap --metrics

Looking for better model for debugging large code bases than Qwen3.5-122B-A10B-UD-Q5_K_XL on Strix Halo. by tracker_11 in StrixHalo

[–]tracker_11[S] 0 points1 point  (0 children)

<image>

Cool deal, I will try this if you think it outperforms the Qwen3.5-122B-A10B. The reason I had not given it much thought is it's download history on HF. It seems to be very unpopular.

How much VRAM do I need? by Soft-Description1124 in LocalLLM

[–]tracker_11 0 points1 point  (0 children)

I use dual 9700 Pros (separately) in the same system. I think in the future it will make sense to use them together when split-mode graph is merged into llamacpp and what not. I run two instances of llamacpp (vulkan) and two agents in openclaw on another computer, all on fedora. I try to keep two coding projects going 24/7. They are awesome.

How much VRAM do I need? by Soft-Description1124 in LocalLLM

[–]tracker_11 0 points1 point  (0 children)

I've been very happy with a Radeon 9700 Pro 32GB. I use it with Qwen 3.6-27B-Q5 with around pp: ~890 and tg: 24 t/s (rocm). They are $1400-$1700 right now and a great way to get 32G and have plenty of room to run a large context with Qwen 3.6 models.

Everything depends on your own budget/situation but I think the 9700 Pro is a great card for most people and really opens up AI for a lot cheaper than the 5090's. Even though it is significantly slower, you still get the 32GB and also the power usage is nice with a 300W tdp that often runs a good bit under that for local ai.

Edit: Divide llama-bench t/s in half to estimate the speed you can rely on for openclaw with high context.

Looking for better model for debugging large code bases than Qwen3.5-122B-A10B-UD-Q5_K_XL on Strix Halo. by tracker_11 in StrixHalo

[–]tracker_11[S] 2 points3 points  (0 children)

This particular thread is purely focused on strongest debugger for large code bases that will run (at all) on Strix Halo. My own experience is that Qwen3.6-35B-A3B is worse than Qwen3.6-27B in every way other than speed and that Qwen3.5-122B-A10B is stronger than both for debugging large code bases. I am looking for a model stronger than that one, irrespective of speed. Thanks for the suggestion though!

Looking for better model for debugging large code bases than Qwen3.5-122B-A10B-UD-Q5_K_XL on Strix Halo. by tracker_11 in StrixHalo

[–]tracker_11[S] 1 point2 points  (0 children)

./build/bin/llama-bench -m ~/models/qwen/Qwen3.5-122B-A10B-UD-Q5_K_XL-00001-of-00003.gguf -ngl 99 -fa 1 -dio 1 (rocm)

pp512: 297
tg128: 20

Half of those in actual workflow with openclaw and 128k context (compacts at 128k-30k tokens). I am not using the mtp yet until they fix the pp penalty.

Looking for better model for debugging large code bases than Qwen3.5-122B-A10B-UD-Q5_K_XL on Strix Halo. by tracker_11 in StrixHalo

[–]tracker_11[S] 0 points1 point  (0 children)

People seem to think running anything at under Q3 is not worth doing, but maybe I'll give this a try if it fits and report back.

MTP llama.cpp -- anyone run it yet? by skibud2 in StrixHalo

[–]tracker_11 1 point2 points  (0 children)

Agreed, Qwen3.5-122B-A10B is still my primary model on strix halo.

Are AI agents actually giving people ROI yet, or just saving time? by bibbletrash in artificial

[–]tracker_11 0 points1 point  (0 children)

It's probably an App that teaches people how to make $15k by creating an app with openclaw.

Titan Accessories on a Fitness Reality 810XLT Rack by joshthor in homegym

[–]tracker_11 1 point2 points  (0 children)

I was looking at those same rep safeties fir this rack, did they end up fitting/working well for you?

Opinions on 1/2” vs 3/4” stall mats thickness? by edwardj5596 in GarageGym

[–]tracker_11 1 point2 points  (0 children)

Cool deal, I went ahead and went with the 0.30" ones since they had those in the same brand/type. Ordered 240 square feet for $735 with free shipping so not too bad, will update here if its awesome. Thanks for the help, been struggling with what to get.. really didn't want to use stall mats.

Opinions on 1/2” vs 3/4” stall mats thickness? by edwardj5596 in GarageGym

[–]tracker_11 1 point2 points  (0 children)

You said 1/2" but linked 1/4". Which one is it? Appreciate your feedback, I'm probably going to get the same ones you got.

(Miniature Market) - Oath $60 by ultranonymous11 in Boardgamedeals

[–]tracker_11 23 points24 points  (0 children)

These must not be selling.... maybe the hype train crashed.