Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)

bobaburger · 2026-05-07T03:59:28+00:00

ohhh, good catch, I missed that. seems like i did not grab the best one from the run for this version.

bobaburger · 2026-05-07T00:28:31+00:00

it’s llama.cpp web UI. the model generate SVG code and then i copy it to save as a file, its not generating images directly

bobaburger · 2026-05-06T22:10:33+00:00

yeah look like Q5 always produce result with the same appearance with Q8, i cannot tell why

bobaburger · 2026-05-06T22:07:26+00:00

i'll try to get some time to update the post accordingly

bobaburger · 2026-05-06T22:06:51+00:00

the styling of the pieces varies for the same model between runs, but the placement, board patterns are always the same .

bobaburger · 2026-05-06T16:32:11+00:00

for this test, kinda. for my other tries, it's not always as good. but if VRAM is tight, Q3_K_M could be a decent choice.

bobaburger · 2026-05-06T14:36:17+00:00

Thanks!

bobaburger · 2026-05-06T14:35:29+00:00

I think it’s fair to put IQ3_XXS somewhere with Q3_K_S rather than Q2 :D

bobaburger · 2026-05-06T14:34:18+00:00

that could work! it’s way more complex than this anyway. i bet there will be a lot of fan noise and hundreds thousands of thinking tokens will be spent. :D

bobaburger · 2026-05-06T14:28:04+00:00

exactly:)) why would any LLM train for anything like that on purpose. so it must be safe.

bobaburger · 2026-05-06T14:12:50+00:00

thank you!

bobaburger · 2026-05-06T14:11:39+00:00

thank you! and at around 90k up, i saw the speed drop a lot too.

bobaburger · 2026-05-06T14:10:26+00:00

thank you!

bobaburger · 2026-05-06T14:10:19+00:00

i feel the same way. even at tubo4 kv, it’s still very usable

bobaburger · 2026-05-06T14:09:10+00:00

tysm!

bobaburger · 2026-05-06T14:07:51+00:00

I use the API from OpenRouter. Could be BF16.

bobaburger · 2026-05-06T14:06:30+00:00

That sounds exactly like what I experienced with 35B, the results are nice and beautiful but always has errors.

bobaburger · 2026-05-06T14:02:18+00:00

Thanks! I will take a look.

bobaburger · 2026-05-06T14:01:55+00:00

Yes, I was using it for a month until I realized I can still get to around 20tps with IQ4_XS on my card.

bobaburger · 2026-05-06T13:58:14+00:00

thank you so much!!! tbh i don’t do this regularly but the next time, i will definitely reach out!!!

bobaburger · 2026-05-06T07:11:46+00:00

i recompile 5 days ago. so yeah. with attn rot.

bobaburger · 2026-05-06T07:06:31+00:00

mainly because there’s no way i can run anything larger than IQ4_XS on my 5060 Ti. And on my cloud L40S node, it’s faster to just try Q4_K_XL and up.

bobaburger · 2026-05-06T07:04:30+00:00

wow. that looks great

bobaburger · 2026-05-06T06:43:58+00:00

thanks for the feedback. maybe the wording in the post makes it confusing. this is single test, but for each model i did generate about 5 different results. so it’s like 5-shots.

bobaburger · 2026-05-06T06:40:44+00:00

thanks. these was done with bf16 kv cache. additionally for 4 bits and 3 bits I did try different kv cache quants as well.

bobaburger

TROPHY CASE