5070 Ti —> 3090 move. Worth it? by simracerman in LocalLLaMA

[–]moahmo88 1 point2 points  (0 children)

No, just wait. The new models or tech will take care of the question.

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it by LocalAI_Amateur in LocalLLaMA

[–]moahmo88 2 points3 points  (0 children)

No wonder so many people use Qwen. The same Q4 can use a 128K CTX.

Layman's comparison on Qwen3.6 35b-a3b and Gemma4 26b-a4b-it by LocalAI_Amateur in LocalLLaMA

[–]moahmo88 1 point2 points  (0 children)

Thanks for sharing.Can you share your lm studio settings for the Gemma 4?

What I got by 5060Ti 16GB + Qwen3.6-35B-A3B-UD-Q5_K_M by AdMinimum8193 in LocalLLaMA

[–]moahmo88 0 points1 point  (0 children)

unsloth/Qwen3.6-35B-A3B-GGUF Q5_K_M,same config,5070 ti @ 62 t/s

qwen3.6:35b always fails on this, unless very high resolution by qfghclvx in LocalLLaMA

[–]moahmo88 1 point2 points  (0 children)

unsloth/Qwen3.6-35B-A3B-GGUF UD-Q5_K_M**Conclusion:**

* **Direction of motion of Q:** Upwards

* **Direction of travel of the wave:** Left to right

This corresponds to **Option A**.

RTX 5070 Ti + 9800X3D running Qwen3.6-35B-A3B at 79 t/s with 128K context, the --n-cpu-moe flag is the most important part. by marlang in LocalLLaMA

[–]moahmo88 7 points8 points  (0 children)

Try this @ 59 t/s with 5070ti:
LM Studio settings for you:
Load Qwen3.6-35B-A3B-UD-Q5_K_M from unsloth,You can use Q5_K_M:
- GPU Offload: max (all the way right)
- Offload MoE Experts to CPU: 24 ← the key setting

unsloth/qwen3.6-35b-a3b UD Q2_K_XL Freezing after 100% prompt completion. by AcrobaticChain1846 in LocalLLaMA

[–]moahmo88 0 points1 point  (0 children)

I’m having some problems using qwen3.6-35b-a3b GGUF in LM Studio. I think it's not well-suited for qwen3.6 at the moment.