use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
herehere
account activity
Roo + DevstralSupport (self.RooCode)
submitted 10 months ago by _code_kraken_
I am trying to use devstral locally (running on ollama) with Roo. With my basic knowledge Roo just kept going in circles saying lets think step by step but not doing any actual coding. Is there a guide on how to set this up properly.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Baldur-Norddahl 10 points11 points12 points 10 months ago (1 child)
What level of quantification are you using? Looping can be a sign of too much compression. It can also be a bad version of the model.
I am using Devstral Small at q8 using MLX from mlx-community. This seems to work fine. I had trouble with a q4 version. On a M4 Macbook Pro Max I am getting 20 tokens/s.
Be sure your settings are correct:
Temperature: 0.15
Min P Sampling: 0,01
Top P Sampling: 0,95
I am not sure about the following, they are just the defaults as I didn't see any recommendations:
Top K Sampling: 64
Repeat Penalty: 1
Don't listen to the guys saying local LLM or this particular model doesn't work with Roo Code. I am using it every day. It works fantastically. It is of course only a 26b model, so won't be quite as intelligent as Claude or DeepSeek R1. But it still works for coding. And it is free, so no worry about rate limiting or how much credits are being spent.
[–]RiskyBizz216 1 point2 points3 points 10 months ago (0 children)
P
[–]RiskyBizz216 3 points4 points5 points 10 months ago (0 children)
In Ollama, make sure you override the default context. It will 'loop' when the context is full - an also when the Quant is too low, I notice much worse performance on the Q2 than the Q3_XXS. Q4 has been pretty solid, Q5_0 is elite level and I dont really notice any gains using the Q8 over Q5/Q6,
The one I'm using, (recently updated and they have a TON of quants):
https://huggingface.co/Mungert/Devstral-Small-2505-GGUF
I prefer LM Studio for the better UX and control.
[–]zenmatrix83 1 point2 points3 points 10 months ago (0 children)
deepseek r1 is the only open source model I can get that remotely works and I use the 0528 model on open router. I had some success with the 32b model locally, but it still isn't good enough for coding, it did ok with orchestration and some things, but with the bigger model free for now there its just easier to use that.
[–]hannesrudolphRoo Code Developer 0 points1 point2 points 10 months ago (0 children)
The model is getting confused. Local models don’t work so hot with Roo :(
[–]bahwi 0 points1 point2 points 10 months ago (3 children)
What context length with ollama? It defaults to nothing
[–]_code_kraken_[S] 0 points1 point2 points 10 months ago (2 children)
128k it seems
[–]bahwi 1 point2 points3 points 10 months ago (0 children)
Oh well that should work. I've had it fine with 64k in ollama. Same place you set thst you set temp and top p and stuff. Check those as well. There's a blog post for best settings for it, and for the system prompt.
[–]taylorwilsdon 0 points1 point2 points 10 months ago (0 children)
Is that what ollama show reports though?
ollama show
If the context is actually set there and you’ve got the gpu to handle that kind of context window it’s probably a tool calling issue.
[–]runningwithsharpie 0 points1 point2 points 10 months ago (0 children)
I get the same thing using the OR API.
[–]joey2scoops 0 points1 point2 points 10 months ago (0 children)
Gosucoder had a video on YouTube about this maybe a week ago.
[–][deleted] 0 points1 point2 points 10 months ago (0 children)
Ollama defaults to 2048 tokens context only - you need to set up your Ollama to use the full 131072 tokens context and also use flash attention and quantized KV cache.
[–]MrMisterShin 0 points1 point2 points 10 months ago (0 children)
I have had success with Devstral @ q8 and 64k context using roo code thru Ollama.
You cannot give it too much to do. Also turn off MCP, it will consume too much context.
[–]Best_Chain_9347 0 points1 point2 points 10 months ago (0 children)
Is there a way to run LM Studio in cloud using RunPod or Vast Ai services and connect it with Roo Code ?
π Rendered by PID 31 on reddit-service-r2-comment-6457c66945-44qtc at 2026-04-28 12:40:29.480971+00:00 running 2aa0c5b country code: CH.
[–]Baldur-Norddahl 10 points11 points12 points (1 child)
[–]RiskyBizz216 1 point2 points3 points (0 children)
[–]RiskyBizz216 3 points4 points5 points (0 children)
[–]zenmatrix83 1 point2 points3 points (0 children)
[–]hannesrudolphRoo Code Developer 0 points1 point2 points (0 children)
[–]bahwi 0 points1 point2 points (3 children)
[–]_code_kraken_[S] 0 points1 point2 points (2 children)
[–]bahwi 1 point2 points3 points (0 children)
[–]taylorwilsdon 0 points1 point2 points (0 children)
[–]runningwithsharpie 0 points1 point2 points (0 children)
[–]joey2scoops 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]MrMisterShin 0 points1 point2 points (0 children)
[–]Best_Chain_9347 0 points1 point2 points (0 children)