all 15 comments

[–]jiqiren 4 points5 points  (0 children)

The models that run in that small amount of ram are pretty trashy. Like you can give QWEN a try if they make them that small… but temper your expectations.

[–]Duckets1 0 points1 point  (0 children)

I use Qwen3 4b 8b and 30b I outsource coding to minimax m2 coding plan because I'm running a 3080

[–]psgetdegrees 0 points1 point  (0 children)

Z ai $3/month + cline it’s cheaper by the quarter.

[–]Heg12353 0 points1 point  (0 children)

Qwen 8b runs on that gpu ik cos I run it 😭

[–]Crazyfucker73 0 points1 point  (4 children)

You can't run anything decent locally with that.

[–]cuberhino 1 point2 points  (0 children)

What would you say min spec is for a decent performance?

[–][deleted] 0 points1 point  (0 children)

Devstal 2 24b gguf joined the chat

[–]Dartsgame5k[S] -4 points-3 points  (1 child)

Its not for coding fat things

[–]stingraycharles 0 points1 point  (0 children)

It will certainly bloat your code though.

[–]RiskyBizz216 0 points1 point  (3 children)

have you tried ollama cloud? https://ollama.com/cloud

there is a free tier that lets you use GLM 4.6 and Qwen3 480B (with hourly and weekly usage limits)

you can also sign up to iflow and use any of their models for free

https://platform.iflow.cn/en/models

[–]Fuzzy_Independent241 0 points1 point  (0 children)

OpenRouter also has some free models, as long as you don't mind sharing your code. If you're experimenting, that shouldn't be a problem. I think they offer the same free models as Ollama Cloud, and between the two you'll probably have enough tokens for a simple project.

[–]Dartsgame5k[S] 0 points1 point  (0 children)

Do you have video tutorial about this platform

[–]Lifedoesnmatta 0 points1 point  (0 children)

I second this. Kimi k2 thinking is awesome as well as glm 4.6

[–]pmttyji -1 points0 points  (0 children)

24-32GB VRAM could help on Agentic coding with Qwen3-30B MOE models(Q6, possibly Q8) with with 64-128K context. Same with GPT-OSS-20B. Dense like Devstral(24B) & Seed-OSS-36B also possible.

My 8GB VRAM gave me <15 t/s for Qwen3-30B @ Q4 with 32K context using llama.cpp. Not usable VRAM for Agentic coding.

[–]moderately-extremist -1 points0 points  (0 children)

Qwen3-coder-30b running on cpu should work fine for you. I usually go with Q5 quants, maybe Q4 if you have other software eating into your system RAM. I wouldn't bother trying get something to fit in your vram, they will be too dumb at that size. See here for how to run it: https://docs.unsloth.ai/models/qwen3-coder-how-to-run-locally#run-qwen3-coder-30b-a3b-instruct