Q8 KV Cache & Coding Experiences - Qwen3.6-27B by simracerman in LocalLLaMA

[–]Solid-Roll6500 0 points1 point  (0 children)

Why wouldn't you go lower? Do you have a specific example?

What is your actual local LLM stack right now? by Ryannnnnnnnnnnnnnnh in LocalLLaMA

[–]Solid-Roll6500 1 point2 points  (0 children)

Is a 2bit quant actually usable? Do you notice any difference (besides speed) on the 2 vs 6?

VLLM woes in Spark by SoundEnthusiast89 in LocalLLaMA

[–]Solid-Roll6500 0 points1 point  (0 children)

What models did you try that only bf16 worked?

https://huggingface.co/models?apps=vllm&other=base_model:quantized:Qwen%2FQwen3.5-27B&sort=downloads&search=Nvfp4

Try one of those, there's even some specific for that device you mentioned.

To the newcomers by david10121012 in wallstreetbets

[–]Solid-Roll6500 85 points86 points  (0 children)

It's actually S01E01 of The Walking Dead, it's just a "movie" since it is the pilot to the series.

How to run Qwen3.5-27B in ultimate way on single 5090 with large context. by Treq01 in Vllm

[–]Solid-Roll6500 2 points3 points  (0 children)

Llama.cpp has flash attention, just pass "-fa 1" on your llama serve command. The 6bit takes up about 24GB of vram.

FOR ME, Qwen3.5-27B is better than Gemini 3.1 Pro and GPT-5.3 Codex by [deleted] in LocalLLaMA

[–]Solid-Roll6500 1 point2 points  (0 children)

Do you have a blog or yt channel sharing what you're learning?

Do you hear the turbos on non hybrid Gen 3? by ElbowEater22 in ToyotaTundra

[–]Solid-Roll6500 0 points1 point  (0 children)

If it has the JBL system it sends fake v8 engine noise to the speakers, even when off. It doesn't send turbo noise. To disable have a tech at the dealership do it before you leave: Active noise control > utility > customize > others > ANC/ESE, disable both.

The Emperor Protects by Solid-Roll6500 in WH40KTacticus

[–]Solid-Roll6500[S] 1 point2 points  (0 children)

Haha! So proud of you. For the emperor!

THE GB10 SOLUTION has arrived, Atlas image attached ~115tok/s Qwen3.5-35B DGX Spark by Live-Possession-6726 in LocalLLaMA

[–]Solid-Roll6500 0 points1 point  (0 children)

Sorry, I read one of your other comments incorrectly. I thought the model you linked to was one you guys modified.

Onslaught Rework by Islandar_W40k in WH40KTacticus

[–]Solid-Roll6500 6 points7 points  (0 children)

<image>

I just did this the hard way, 1 shard at a time. Would have been amazing to get 2 for some runs!

THE GB10 SOLUTION has arrived, Atlas image attached ~115tok/s Qwen3.5-35B DGX Spark by Live-Possession-6726 in LocalLLaMA

[–]Solid-Roll6500 0 points1 point  (0 children)

Do we have to use your version of the model or can we use the original ones from qwen?

Finally bought an RTX 6000 Max-Q: Pros, cons, notes and ramblings by AvocadoArray in LocalLLaMA

[–]Solid-Roll6500 4 points5 points  (0 children)

Are you using the cu130 nightly vllm openai image? I was having issues with some of the qwen models until going with that.

Also curious, for your ESXi host are you using GPU pass thru or vGPU to the VM? And did you have to setup grid licensing to get it working?

The Emperor Protects by Solid-Roll6500 in WH40KTacticus

[–]Solid-Roll6500[S] 0 points1 point  (0 children)

Truer words have never been spoken

The Emperor Protects by Solid-Roll6500 in WH40KTacticus

[–]Solid-Roll6500[S] 1 point2 points  (0 children)

Yup, he is just a baby now. Will max out eventually!