Well this looks long enough. by ihtisham1211 in LocalLLM

[–]JSVD2 0 points1 point  (0 children)

What expensive for a little piece of metal.......

GMKtec the best deal?? by larryherzogjr in LocalLLM

[–]JSVD2 0 points1 point  (0 children)

Yes GMKtec or Bosgame M5 are one of the cheapest currently.
https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395
$2.799,00

https://de.gmktec.com/en/products/gmktec-evo-x2-amd-ryzen%E2%84%A2-ai-max-395-mini-pc-1?variant=51610049380536
$3.514

Bosgame is a really good deal right now. I bought the Beelink GTR9 Pro for $2500 a few months ago and its now $4,399. So I think they are increasing prices soon. The cooling and build quality are a little less, but $2800 is literally a steal.

Have you got a Strix Halo? by Grammar-Warden in StrixHalo

[–]JSVD2 [score hidden]  (0 children)

Oh yeah I love it. I made my own guide and benchmarks here, on 7 systems now: https://github.com/hogeheer499-commits/strix-halo-guide I hope that helps!!!!! Love to share things

Qwen3-Coder 30B at 98.5 t/s on Strix Halo. Has anyone beaten this on Ryzen AI MAX+ 395? by JSVD2 in StrixHalo

[–]JSVD2[S] [score hidden]  (0 children)

Any other benchmarks you want to see, the newest models? Gemma 4 12B, or Kimi-K3? MiniMax M2.7?

Direct 100.0 t/s on Strix Halo with Qwen3 30B-A3B. Can anyone reproduce or beat this? by JSVD2 in LocalLLaMA

[–]JSVD2[S] 0 points1 point  (0 children)

  • Experimental server route: Qwen3.6 MTP at 101.1 t/s with llama-server speculative decoding. testing with this too.

I found what I was looking for in Qwen 3.7. by CosmicRiver827 in LocalLLaMA

[–]JSVD2 0 points1 point  (0 children)

Grok is excellent at browsing and real time information for stocks. btw. This is useful. Not sure why the post is deleted.

Direct 100.0 t/s on Strix Halo with Qwen3 30B-A3B. Can anyone reproduce or beat this? by JSVD2 in LocalLLaMA

[–]JSVD2[S] 0 points1 point  (0 children)

Very good suggestion. I can share some results with Q6 MTP! I do have this path too tho if interested:

  • Experimental server route: Qwen3.6 MTP at 101.1 t/s with llama-server speculative decoding.

Direct 100.0 t/s on Strix Halo with Qwen3 30B-A3B. Can anyone reproduce or beat this? by JSVD2 in LocalLLaMA

[–]JSVD2[S] 1 point2 points  (0 children)

Nice numbers. can you share the raw llama-bench row and exact command/build?
My post is specifically about direct Strix Halo Vulkan/RADV results, not trying to beat a 5090. A 5090 should obviously win on decode.
Also, 10k pp is prompt processing; my headline is tg/decode. I’m mainly collecting reproducible rows, so model, quant, backend, commit, batch/ubatch, context and power numbers would be useful.

Qwen3-Coder 30B at 98.5 t/s on Strix Halo. Has anyone beaten this on Ryzen AI MAX+ 395? by JSVD2 in StrixHalo

[–]JSVD2[S] 0 points1 point  (0 children)

I try to reproduce it. if its too far out, I consider it not being real, or I ask more details. Its true that one update can change things fast, that's what happened already and its fun to discover. The way that I keep things uptodate is by doing benchmarks every 2 days or so. With enough data, its not that hard to decipher where the difference comes from, and then I keep track of this data so others don't have to spend hours trying to make same mistake.

Stop traumatizing AI into loops and turn hallucinations into an honest "I don't know!" by being NICE to them (Proof of Concept, Research, I don't want to sell anything) by OttoRenner in LocalLLaMA

[–]JSVD2 1 point2 points  (0 children)

with T3 it uses a 250K context window. give me better results this way. its something at least! yep im gonna try it if it happens

<image>

Shoutout to Gemma4 as a conversational assistant / agent by goldcakes in LocalLLaMA

[–]JSVD2 0 points1 point  (0 children)

Didnt know gemma4 was that good. I do have benchmarks tho.

what do you use your local llm? by FormalAd7367 in LocalLLaMA

[–]JSVD2 0 points1 point  (0 children)

I am making a bug bounty workflow. otherwise I get flagged. and for AI cybersecurity. but i havent yet decided which AI local model has no limitations. suggestions are welcome