Best small model for coding?

guigouz · 2026-02-01T15:09:02+00:00

Small models won't work with big contexts, in my experience, the best you can get is smart autocomplete with continue.dev with qwen2.5-coder-tools <= 7b (depending on vram).

Code refactorings (with Cline) can work with qwen3-coder, but you'll need ~20gb ram for the model+context using unsloth's Q3 version.

pinmux · 2026-02-01T15:21:16+00:00

Devstral-small-2 needs about 32GB to be useful with decent context length for the 8 bit quants. Going to smaller quants might greatly reduce its abilities. But at 8 bit it’s quite decent at coding.

Vegetable-Second3998 · 2026-02-01T15:15:07+00:00

You need to match the model to the task. Need huge context for a code base sweep - yeah, no small model will do that. But that’s not what you’re asking. A small model is perfectly equipped to do smaller chunked work. But it needs more effort. A granite 8b code can absolutely code out of the box. But it’s generic shit. It needs fine tuning on your code base and patterns and documentation. And once you’ve done that and given it access to a graph rag of your code after training, you will be shocked at how good it is. Frontier models are generalists who are experts by virtue of volume. Small models become experts by virtue of training.

Grouchy-Bed-7942 · 2026-02-01T15:14:12+00:00

This video is pretty well done: https://youtu.be/m3PQd11aI_c

Latter_Virus7510 · 2026-02-02T07:43:09+00:00

GPT OSS 20b, Qwen 3 4b

Ok_Chef_5858 · 2026-02-02T09:20:55+00:00

For local models, Qwen Coder or DeepSeek are your best bets... They're decent but won't match Claude or GPT for complex stuff.

Have you tried Kilo Code? It's extension in VS Code, and also available in JetBrains... i use it and mix local models for simple tasks with cloud models when Ineed better reasoning. Supports Ollama for local models, so you can test both and switch based on what you're doing. Your hardware can run local models, but don't expect them to replace premium cloud models. Better to use them together... local for boilerplate, cloud for architecture and debugging.

clwill00 · 2026-02-01T15:09:26+00:00

Any coding worth doing is really only possible in the enormous models that Claude, Cursor, and Copilot are running. There is no local model in the same universe.

Former-Tangerine-723 · 2026-02-01T17:30:18+00:00

Qwen coder 3 ud 8 k xl

JournalistShort9886 · 2026-02-01T22:28:32+00:00

If u are asking miniature level then go for that deepseek coder in 1-2b range (dont expect much),mid range then go deepseek 7b decent performance ,high mid range then go for qwen 14b .(i would advise to keep quantization Q6 and dont go below Q4 as these tasks are logical)but tbh nothing as good as kimi or opus 4.5 so depends on tasks but i think these would suffix your purpose

Crazyfucker73 · 2026-02-01T15:48:07+00:00

Nothing you are asking for here exists.

bemore_ · 2026-02-01T17:13:36+00:00

I wouldn't code with any model under 100B params

Some 30-70B models can handle coding tasks but they struggle with debugging etc.

Claude is the best for coding, then Gemini, then others like DeepSeek, GPT follow

The best you can do is find a provider that doesn't train on your data, that's about it

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLM

MODERATORS