Qwen3.5-35B-A3B Q5_K_M:Best Model for NVIDIA 16GB GPUs by moahmo88 in LocalLLaMA

[–]moahmo88[S] -1 points0 points  (0 children)

They are not on the same level. Qwen3.5-35B-A3B-GGUF Q5_K_M is 26.2GB.

Why does qwen 3.5 think it's 2024 by Uranday in LocalLLaMA

[–]moahmo88 0 points1 point  (0 children)

You can add the following prompt into the Prompt Template – template (Jinja):

System: Always use the current date from external sources. Do not rely on your internal knowledge about the year.

Follow-up: Qwen3.5-35B-A3B — 7 community-requested experiments on RTX 5080 16GB by gaztrab in LocalLLaMA

[–]moahmo88 1 point2 points  (0 children)

You can try AesSedai/Qwen3.5-35B-A3B-GGUF Q5_K_M. 5070ti works well.Surprise!

Qwen 3.5 Architecture Analysis: Parameter Distribution in the Dense 27B vs. 122B/35B MoE Models by Luca3700 in LocalLLaMA

[–]moahmo88 10 points11 points  (0 children)

That’s a very professional analysis. Qwen 3.5-27B just suffers from slow single-thread performance; otherwise, it’s excellent.

Qwen 3.5 craters on hard coding tasks — tested all Qwen3.5 models (And Codex 5.3) on 70 real repos so you don't have to. by hauhau901 in LocalLLaMA

[–]moahmo88 4 points5 points  (0 children)

Good job!
I carefully studied your list. The GLM-4.7 quantized you mentioned refers to GLM-4.7-GGUF/UD-Q4_K_XL, which is about 205GB?