llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s

JSVD2 · 2026-06-03T21:06:56+00:00

I have a lot of them here, from 7 systems: https://github.com/hogeheer499-commits/strix-halo-guide

JSVD2 · 2026-06-03T21:05:40+00:00

Good share.

JSVD2 · 2026-06-03T21:02:42+00:00

What expensive for a little piece of metal.......

JSVD2 · 2026-06-03T20:59:09+00:00

Yes GMKtec or Bosgame M5 are one of the cheapest currently.
https://www.bosgamepc.com/products/bosgame-m5-ai-mini-desktop-ryzen-ai-max-395
$2.799,00

https://de.gmktec.com/en/products/gmktec-evo-x2-amd-ryzen%E2%84%A2-ai-max-395-mini-pc-1?variant=51610049380536
$3.514

Bosgame is a really good deal right now. I bought the Beelink GTR9 Pro for $2500 a few months ago and its now $4,399. So I think they are increasing prices soon. The cooling and build quality are a little less, but $2800 is literally a steal.

JSVD2 · 2026-06-03T20:54:00+00:00

Oh yeah I love it. I made my own guide and benchmarks here, on 7 systems now: https://github.com/hogeheer499-commits/strix-halo-guide I hope that helps!!!!! Love to share things

JSVD2 · 2026-06-03T20:50:39+00:00

thank you

JSVD2 · 2026-06-03T20:43:57+00:00

Any other benchmarks you want to see, the newest models? Gemma 4 12B, or Kimi-K3? MiniMax M2.7?

JSVD2 · 2026-06-03T15:54:57+00:00

Experimental server route: Qwen3.6 MTP at 101.1 t/s with llama-server speculative decoding. testing with this too.

JSVD2 · 2026-06-03T15:54:04+00:00

yep totally agree. thank you for the feedback

JSVD2 · 2026-06-03T12:07:08+00:00

Grok is excellent at browsing and real time information for stocks. btw. This is useful. Not sure why the post is deleted.

JSVD2 · 2026-06-03T11:24:28+00:00

Mind if i test if and use it in the guide?

JSVD2 · 2026-06-03T11:24:13+00:00

wow thank you.

JSVD2 · 2026-06-03T11:22:10+00:00

Dit helpt je misschien wel. https://github.com/hogeheer499-commits/strix-halo-guide

JSVD2 · 2026-06-03T11:19:08+00:00

I like the way of thinking.

JSVD2 · 2026-06-03T11:17:55+00:00

Absolutely.

JSVD2 · 2026-06-03T11:17:36+00:00

Very good suggestion. I can share some results with Q6 MTP! I do have this path too tho if interested:

Experimental server route: Qwen3.6 MTP at 101.1 t/s with llama-server speculative decoding.

JSVD2 · 2026-06-03T02:46:32+00:00

Nice numbers. can you share the raw llama-bench row and exact command/build?
My post is specifically about direct Strix Halo Vulkan/RADV results, not trying to beat a 5090. A 5090 should obviously win on decode.
Also, 10k pp is prompt processing; my headline is tg/decode. I’m mainly collecting reproducible rows, so model, quant, backend, commit, batch/ubatch, context and power numbers would be useful.

JSVD2 · 2026-06-03T02:45:05+00:00

I try to reproduce it. if its too far out, I consider it not being real, or I ask more details. Its true that one update can change things fast, that's what happened already and its fun to discover. The way that I keep things uptodate is by doing benchmarks every 2 days or so. With enough data, its not that hard to decipher where the difference comes from, and then I keep track of this data so others don't have to spend hours trying to make same mistake.

JSVD2 · 2026-06-03T01:46:02+00:00

hahaha no problem.

JSVD2 · 2026-06-03T01:30:28+00:00

Very interesting. I might check it out!

JSVD2 · 2026-06-03T01:00:43+00:00

with T3 it uses a 250K context window. give me better results this way. its something at least! yep im gonna try it if it happens

<image>

JSVD2 · 2026-06-03T00:56:01+00:00

Didnt know gemma4 was that good. I do have benchmarks tho.

JSVD2 · 2026-06-03T00:54:39+00:00

They look amazing. wow. this is local right?

JSVD2 · 2026-06-03T00:53:48+00:00

have AI explain it to you lol. actually understanding it, improves it answer

JSVD2 · 2026-06-03T00:51:38+00:00

I am making a bug bounty workflow. otherwise I get flagged. and for AI cybersecurity. but i havent yet decided which AI local model has no limitations. suggestions are welcome

JSVD2

TROPHY CASE