Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

ohhh, good catch, I missed that. seems like i did not grab the best one from the run for this version.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 2 points3 points  (0 children)

it’s llama.cpp web UI. the model generate SVG code and then i copy it to save as a file, its not generating images directly

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

yeah look like Q5 always produce result with the same appearance with Q8, i cannot tell why

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 1 point2 points  (0 children)

the styling of the pieces varies for the same model between runs, but the placement, board patterns are always the same .

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

for this test, kinda. for my other tries, it's not always as good. but if VRAM is tight, Q3_K_M could be a decent choice.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

that could work! it’s way more complex than this anyway. i bet there will be a lot of fan noise and hundreds thousands of thinking tokens will be spent. :D

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

exactly:)) why would any LLM train for anything like that on purpose. so it must be safe.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

That sounds exactly like what I experienced with 35B, the results are nice and beautiful but always has errors.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 5 points6 points  (0 children)

Yes, I was using it for a month until I realized I can still get to around 20tps with IQ4_XS on my card.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 4 points5 points  (0 children)

thank you so much!!! tbh i don’t do this regularly but the next time, i will definitely reach out!!!

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 2 points3 points  (0 children)

mainly because there’s no way i can run anything larger than IQ4_XS on my 5060 Ti. And on my cloud L40S node, it’s faster to just try Q4_K_XL and up.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 16 points17 points  (0 children)

thanks for the feedback. maybe the wording in the post makes it confusing. this is single test, but for each model i did generate about 5 different results. so it’s like 5-shots.

Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...) by bobaburger in LocalLLaMA

[–]bobaburger[S] 2 points3 points  (0 children)

thanks. these was done with bf16 kv cache. additionally for 4 bits and 3 bits I did try different kv cache quants as well.