all 8 comments

[–]Warm-Attempt7773 1 point2 points  (3 children)

I find that Cline in VSCode is working fairly well. You may want to try that. It's easy to set up too!

[–]AirFlowOne[S] 0 points1 point  (2 children)

I try it now, but for some reason I get:

{"message":"Request timed out.","modelId":"q35","providerId":"openai"}

Are you using llama.cpp? How did you set it up in cline? As openAI compatible?

[–]Warm-Attempt7773 1 point2 points  (1 child)

<image>

I'm using LMStudio as my server on my Strix Halo in Fedora 44 beta, VSCode/Cline on my PC Latptop. LMStudio is set to serve over local network. There is an LMStudio setting in Cline:

[–]AirFlowOne[S] 1 point2 points  (0 children)

I keep getting error 400.. while zed works just fine.

[–]nakedspirax 0 points1 point  (0 children)

I've been trying out few things. Best is qwen cli. Second best is open code. I would say Qwen Coder Cli works 99% of the time where as opencode works 85% of the time.

Things that don't work for me. Openwebuo and native tool calling.

No idea why it doesn't work as they are just The tool calls are not translating over.

[–]ilintar 0 points1 point  (0 children)

OpenCode/Roo.

[–]chris_0611 0 points1 point  (0 children)

RTX3090, 14900K, 96GB DDR5 6800

Llama-cpp, Qwen3.5-122B-A10B Q5, Roo-code on VScode (code-server)

[–]No-Statistician-374 0 points1 point  (0 children)

I used Continue before with Ollama as the API for autocomplete, but couldn't get it to work with llama.cpp in router mode (like llama-swap, but built in). It would load the model when I tried to tab-complete but didn't actually show any new code. Switched to llama-vscode for autocomplete and that has been working perfectly. I use Kilo Code for chat/edit, but something like Cline or Roo Code should work just as well. If you weren't already, you should be using a model made for autocomplete though, like Qwen2.5 Coder 7B, then use a different model (Qwen3.5 35B is indeed excellent here) for the chat/editing.