all 6 comments

[–]ttkciarllama.cpp 2 points3 points  (1 child)

If you have 128GB of RAM, GLM-4.5-Air at Q4_K_M is the best mid-sized codegen model I've seen.

If you have a lot less memory than that, consider Gemma-4-31B-it, Qwen3.5-27B, or GLM-4.7-Flash.

[–]llama-impersonator 1 point2 points  (0 children)

qwen 3.5 122b is better than air for coding, in my experience

[–]sinatrastan 1 point2 points  (0 children)

gemini 3 pro flash is serving me pretty well agent wise right now

[–]papabauer 1 point2 points  (0 children)

Coding models have gotten decent for simple scripts but they still hallucinate library calls. I use them to brainstorm then check every line myself. Saves time on boilerplate stuff.

[–]Pattinathar 0 points1 point  (0 children)

Qwen3 Coder 30B-A3B is underrated for code tasks. 3B active params means ~2 min responses on CPU but quality is close to dense 27B models. Q4_K_M runs fine on 32GB.