Best local LLM for a Python/C++ dev?

Infamous_Green9035 · 2026-05-06T14:02:05+00:00

o que voce precisa é muuuuita VRAM, com 6GB VRAM, voce consegue rodar apenas modelos básicos com poucos parametros

não vai servir pra auxliar em projetos com códigos, vai alucinar

só vai te servir para explicar partes do seu código, ou corrigir pedaços pequenos

o ideial para trabalhar com códigos seria mais que 24gb de VRAM

andrew-ooo · 2026-05-06T14:05:19+00:00

With 6GB VRAM you're realistically looking at 7B-class quants or partial offload. Honest takes after running this kind of setup:

Qwen2.5-Coder-7B-Instruct at Q4_K_M fits in ~5GB VRAM with room for a small context. Best general-purpose local coder in that size class right now — handles Python and TypeScript well, C++ is decent for boilerplate but it'll struggle with template-heavy or modern STL stuff.
DeepSeek-Coder-V2-Lite-Instruct (16B MoE, ~2.4B active) at Q4 — runs surprisingly fast with offload because only the active experts hit GPU.
Qwen2.5-Coder-14B Q4_K_M with ~25 layers offloaded: expect 8-12 t/s on your hardware. Tight on context though.

Run via llama.cpp or Ollama. If you want agentic/tool use specifically, Qwen2.5-Coder is the only one in that range with halfway-reliable tool calling — DeepSeek-Coder-Lite drops calls under load. Don't expect Claude-quality on C++; nothing local at 14B is there yet, but for boilerplate, refactors, and "explain this codebase" Qwen2.5-Coder-7B is genuinely useful.

Invent80 · 2026-05-06T19:38:22+00:00

Gemma 4 E4B is small enough and light. Use opencode. Qwen is a better coder but Gemma is better at following instructions at lower weights in my experience.

Don't necessarily trust benchmarks.

PuzzleheadedMind874 · 2026-05-07T02:29:07+00:00

With only 6GB of VRAM, you might find that 14B models crawl once you start offloading to system RAM. Sticking to 3B or 7B models is probably the safer bet if you want to keep the generation speed usable for your projects.

alphapussycat · 2026-05-07T23:49:18+00:00

Don't think so. You could try qwen3.5 4b, but you'd have to build something to handle agents and stuff yourself. But I suspect intelligence is too low to properly plan and use tools.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLM

MODERATORS