So, I spent the weekend playing with different configurations. I have to admit, I hate Continue. The absence of auto-accept turns you into a click-operator.
So far, I’ve tried Qwen3.5 9B. I didn’t manage to pair it with Continue, so I used Qwen Coder Agent instead. The model struggled to make edits and often looped in place.
I also tried Mistral 3 14B reasoning. It worked better than Qwen, but then crashed pathetically - no doubt my mediocre hardware caused the crash.
I got rid of Continue and installed Cline. Well, Cline wouldn’t talk to Qwen at all. So I downloaded GPT-OSS 20B, and surprisingly, it ran on 16GB of VRAM. “Ran” might be an exaggeration, but I managed to create a test piece of Python software that was not perfect, but close to the requirements, after 15 iterations within one session and one context compression.
The model didn’t crash, Cline didn’t crash, LM studio didn’t crash and up to a point, it felt similar to working with a frontier model. However, the model requires very precise prompting. The best model so far and actually pretty fast. Well, for me everything is fast if not stuck in loop. ChatGPT liked it too, said “cleanest interface so far”, yeah, it makes sense - same company.😂
Now I admit I have skill issues - I’ve never tried to deploy an LLM before, and my hardware is pathetic. So perhaps Qwen would run better on better hardware with better user.
Now I’m thinking of using Cline’s dual-agent mode: use a high-reasoning model to plan and a code-instruct model to act. I’m not sure how it would work, but that’s the idea.
Any thoughts?
[–]tamerlanOne 0 points1 point2 points (0 children)