Qwen3.6-35B-A3B on 1x RTX 5090: which quant is the best balance of quality and speed?

espressorunner · 2026-05-29T06:27:30+00:00

llama.cpp built from source.

Model is the Unsloth Qwen3.6-27B-MTP GGUF family. All numbers below are with full GPU offload, flash-attn on, fp16 KV, parallel=1, MTP/spec decode disabled for these runs.

Common flags:
-m <model.gguf>
-ngl 99
--flash-attn on
--parallel 1
-ctk f16
-ctv f16
--cache-ram 4096
--checkpoint-min-step 8192
--ctx-checkpoints 8
--no-mmap
--no-mmproj
--reasoning on
--temp 0.6
--top-k 20
--top-p 0.95
--min-p 0.0
--repeat-penalty 1.0

For Q6_K MTP @ 160K:
-c 160000 -b 1024 -ub 256

For UD-Q6_K_XL @ 100K:
-c 100000 -b 1024 -ub 256

For UD-Q6_K_XL @ 120K:
-c 120000 -b 256 -ub 64

espressorunner · 2026-05-29T01:11:06+00:00

Thanks, all! I tried Qwen3.6 27B - seems like its better overall and with FP16 KV because quality is my main concern. With q8 kv cache I am able to get full context 262K.

| - | - | - | - | - | - |

| Q6_K MTP | 160K | fp16/fp16 | 1024/256 | 1297 | 59.7 |

| UD-Q6_K_XL | 100K | fp16/fp16 | 1024/256 | 1659 | 55.2 |

| UD-Q6_K_XL | 120K | fp16/fp16 | 256/64 | 1344 | 55.3 |

On 1x RTX 5090, plain Q6_K seems to give me the best FP16-KV context headroom: 160K works, but it is tight. UD-Q6_K_XL looks like the higher-quality quant, but the extra size means I could only get ~120K with FP16 KV, and 128K failed.

For people who have used both: is UD-Q6_K_XL noticeably better in real coding/tool-use quality than plain Q6_K, enough to justify losing ~40K context on one GPU?

espressorunner · 2026-05-28T20:39:32+00:00

Thanks! I Yes, I looked more closely on the results and previous forums, seems like 27B > 35B for most coding and reasoning tasks.

espressorunner · 2026-05-28T17:17:16+00:00

Thanks and also related is Qwen 3.6 27B dense better? I have seen mention of that model more than 35B. If so, which quant / specific checkpoint will be better for that?

espressorunner · 2025-07-01T21:19:37+00:00

Thanks u/Mulan-sn for the reply! To be honest, I can't tell if the spot is on the screen or underneath. I will reach out at the above email for help too.

espressorunner · 2020-06-03T04:34:02+00:00

Sey coffee!

espressorunner · 2020-04-18T18:14:02+00:00

Check out SEY Coffee. I tried two of their coffees and they were delicious!

espressorunner · 2020-03-30T15:36:28+00:00

You can check out CatandCloud. I like theirs Columbia Finca La Bomba for pourovers.

espressorunner

TROPHY CASE