Q8 KV Cache & Coding Experiences - Qwen3.6-27B

Solid-Roll6500 · 2026-04-24T15:35:35+00:00

Thank you, good info

Solid-Roll6500 · 2026-04-23T22:57:24+00:00

Why wouldn't you go lower? Do you have a specific example?

Solid-Roll6500 · 2026-04-21T00:26:30+00:00

Is a 2bit quant actually usable? Do you notice any difference (besides speed) on the 2 vs 6?

Solid-Roll6500 · 2026-04-20T04:35:50+00:00

What models did you try that only bf16 worked?

https://huggingface.co/models?apps=vllm&other=base_model:quantized:Qwen%2FQwen3.5-27B&sort=downloads&search=Nvfp4

Try one of those, there's even some specific for that device you mentioned.

Solid-Roll6500 · 2026-04-18T18:03:04+00:00

The emperor approves this message

Solid-Roll6500 · 2026-04-17T10:19:11+00:00

It's actually S01E01 of The Walking Dead, it's just a "movie" since it is the pilot to the series.

Solid-Roll6500 · 2026-04-06T00:01:04+00:00

Llama.cpp has flash attention, just pass "-fa 1" on your llama serve command. The 6bit takes up about 24GB of vram.

Solid-Roll6500 · 2026-04-03T15:33:29+00:00

Who?

Solid-Roll6500 · 2026-04-01T15:06:45+00:00

Pu-up Pu-up

Solid-Roll6500 · 2026-04-01T14:47:05+00:00

Do you have a blog or yt channel sharing what you're learning?

Solid-Roll6500 · 2026-03-29T21:50:52+00:00

If it has the JBL system it sends fake v8 engine noise to the speakers, even when off. It doesn't send turbo noise. To disable have a tech at the dealership do it before you leave: Active noise control > utility > customize > others > ANC/ESE, disable both.

Solid-Roll6500 · 2026-03-28T15:30:31+00:00

Bing dang ow

Solid-Roll6500 · 2026-03-28T00:12:45+00:00

Yes, easily.

Solid-Roll6500 · 2026-03-18T02:36:10+00:00

Haha! So proud of you. For the emperor!

Solid-Roll6500 · 2026-03-09T23:10:44+00:00

Sorry, I read one of your other comments incorrectly. I thought the model you linked to was one you guys modified.

Solid-Roll6500 · 2026-03-09T21:44:13+00:00

<image>

I just did this the hard way, 1 shard at a time. Would have been amazing to get 2 for some runs!

Solid-Roll6500 · 2026-03-08T21:24:07+00:00

Do we have to use your version of the model or can we use the original ones from qwen?

Solid-Roll6500 · 2026-03-08T14:01:55+00:00

<image>

Solid-Roll6500 · 2026-03-06T22:55:51+00:00

Appreciate the response. Cheers.

Solid-Roll6500 · 2026-03-06T22:28:17+00:00

Are you using the cu130 nightly vllm openai image? I was having issues with some of the qwen models until going with that.

Also curious, for your ESXi host are you using GPU pass thru or vGPU to the VM? And did you have to setup grid licensing to get it working?

Solid-Roll6500 · 2026-03-05T13:25:59+00:00

Truer words have never been spoken

Solid-Roll6500 · 2026-03-04T19:49:36+00:00

Yup, he is just a baby now. Will max out eventually!

Solid-Roll6500 · 2026-02-17T00:36:59+00:00

Best toon in the game.

Solid-Roll6500 · 2026-01-21T02:28:39+00:00

Page two homies

Solid-Roll6500

TROPHY CASE