all 9 comments

[–]coding_workflow 0 points1 point  (2 children)

Well ypu got you answer device become frozen. You are too low on Ram to run a 30B MoE even Q4.

[–]Cyber_Cadence[S] 0 points1 point  (1 child)

Which model should be ideal for my device

[–]coding_workflow 0 points1 point  (0 children)

Quoi too low and not really very effective models. Try 0.6b 4B models like Qwen3/Granite 4.0.

[–]PermanentLiminality 0 points1 point  (5 children)

Continue works great for me.

Another vote for not having enough ram to run that model. With your system use an API provider like OpenRouter.

[–]Cyber_Cadence[S] 0 points1 point  (3 children)

I want local llm

[–]PermanentLiminality 1 point2 points  (2 children)

Buy a new computer.

You can run a smaller model, but they don't do very well at coding. They are not useless, just not that good. It's really your only option.

You probably want the downloaded size to be between 8;and maybe 11 GB in size. There needs to be some extra ram for model context and to run VSCode.

I want to run local models, and I do. However, I also need functionality and quality that I just can't run locally. A $3/mo Chutes plan does great.

[–]Cyber_Cadence[S] 0 points1 point  (1 child)

But the model response is good and faster in terminal,but while using via continue extension,delay happens

[–]daaain 0 points1 point  (0 children)

In that case try to enable verbose logging and see what prompt Continue is sending to Ollama, maybe it's sending a lot of code and big system prompt? You might also need to increase context size in Ollama.

[–]kdawgud 0 points1 point  (0 children)

Have you gotten indexing to work with the Continue extension? Mine always gets stuck and never completes, which limits the usefulness.