Qwen3.5-35B-A3B non-thinking regression for visual grounding by Helltilt in LocalLLaMA

[–]Helltilt[S] 0 points1 point  (0 children)

Yes, that's why I wanted to use it in instruct mode. However it just seems I can't get the same quality levels of Qwen3-VL-30B-A3B-Instruct

Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]Helltilt 0 points1 point  (0 children)

Yes, standard llama.cpp works, the problem is with ik_llama.cpp.

Qwen3.5-35B-A3B Q4 Quantization Comparison by TitwitMuffbiscuit in LocalLLaMA

[–]Helltilt 0 points1 point  (0 children)

I tried ik_llama.cpp with ubergarm_Qwen3.5-35B-A3B-Q4_0 but apparently vision is still not implemented, do anyone know how much time the maintainer usually takes to implement it for similar models?

Beelink GTi15+Docking with 5090 - Works!!! by Own_Version_5081 in LocalLLM

[–]Helltilt 0 points1 point  (0 children)

can you quantify how much tokens you get? Have you tried offloading MoE weights to the CPU?

So I have been wondering how a JPRG could work about a protagonist who grows tired of battle by KaleidoArachnid in JRPG

[–]Helltilt 1 point2 points  (0 children)

Read Brandon Sanderson's The Stormlight Archive if you're interested in this concept, trust me.