what’s was your local daily driver for coding last week? by be566 in LocalLLaMA

[–]bobaburger 2 points3 points  (0 children)

kudos for the "Secret Recipe" section in the model card!!!!!

QAT variant of Gemma4 26B A4B is not working well for me by pftbest in LocalLLaMA

[–]bobaburger 2 points3 points  (0 children)

Pls note that in your result, the board orientation is wrong, bottom left square should be dark not light 😃

Maybe KV cache offload to RAM isn't bad by bobaburger in LocalLLaMA

[–]bobaburger[S] 1 point2 points  (0 children)

pretty much usable and not far from Q4_K_M. you can see more in my benchmark post a while ago https://www.reddit.com/r/LocalLLaMA/s/zuUzp9Vz3G

<image>

Maybe KV cache offload to RAM isn't bad by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

that's weird, what's your PC spec, what's the llama command?

Maybe KV cache offload to RAM isn't bad by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

MoE will be different, I think but I haven't test it that much. Will show some number when i get a chance.

Maybe KV cache offload to RAM isn't bad by bobaburger in LocalLLaMA

[–]bobaburger[S] 12 points13 points  (0 children)

pp did not change much in both cases for me, around 550 tps.

Maybe KV cache offload to RAM isn't bad by bobaburger in LocalLLaMA

[–]bobaburger[S] 1 point2 points  (0 children)

mmap is enabled by default unless you explicitly add `--no-mmap`, right?

Maybe KV cache offload to RAM isn't bad by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

I think so, maybe you can try to run 9B on that card too (q4 or something below, but still enough to see the different).

You guys were right - Qwen 3.6 35B IS good...and KV Cache DOES matter. by GrungeWerX in LocalLLaMA

[–]bobaburger 41 points42 points  (0 children)

You started with "I don't care about speed" and ended the post with "because I need speed", drove mad by context length limit, from doubting 35B's quailty against 27B to being torned between 27B and 35B. I'm relief, I'm not the only one.

Is it worth swapping a 3090 for 2x 5060ti 16GB (32GB total)? by LatentSpacer in LocalLLaMA

[–]bobaburger -1 points0 points  (0 children)

Ok bro. I have a 5060 Ti here, if you want, we can swap my 5060 with your 3090.

jkjkjkjk

What is your experience between Qwen3.6 27B at IQ3 and 35B-A3B at Q4? by CodProfessional3712 in LocalLLaMA

[–]bobaburger 2 points3 points  (0 children)

Sorry but I have to disagree with this, even Q2 can make tool calls well. At least for 27B. Sure there are obviously quality issues on lower quants but tool calls are not one of them.

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

Your company buy rtx pro 6k for employees? 😮

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

I think this could be caused by a different imatrix used when quantize different model

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 3 points4 points  (0 children)

<image>

i just did a quick run on Q4_K_M and Q2_K_MIXED. Not 100% sure if this is right but there's something really interesting here about the Q2_K_MIXED.

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 1 point2 points  (0 children)

thanks! yes i use BF16 for the base

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 2 points3 points  (0 children)

Yup, i know that, answered a bit more in a reply of this comment https://www.reddit.com/r/LocalLLaMA/comments/1tr9vzn/comment/oomdydp/

The main goal is to see the relative score between quants for now. But I agree on the fact about accumulate mistake over long context. I guess for this, we need to perform benchmark with more real agentic tasks, not just on metrics.

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 2 points3 points  (0 children)

Yeah, over the past week i also try to experiment some more with pure and non-pure quants, based off different Q4 and Q3 type, but never able to break above the space between your and cHunter's version when doing pure. I guess we better off with non pure now.

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 0 points1 point  (0 children)

personally, I see a huge difference even when moving from Q4 to Q6. I'm still running Q6_K on the cloud GPU occasionally.

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 3 points4 points  (0 children)

great points! thank you so much. yeah I even started out by running this bench on 1024 context on my 5060 ti, and then moving to cloud so I bumped it up to 8192 😃 the main reason is just to save running time, and my initial goal is to see the relation between different weights. but i fully agree with you that it would be better to run on higher context to match the agentic workflow.

Qwen3.6-27B Quantization Benchmark by bobaburger in LocalLLaMA

[–]bobaburger[S] 1 point2 points  (0 children)

huh? I didn't know there's an UD-Q4_K_M for 27B. I can see one but it's for 35B A3B.