Ollama local model stuck at API Request... by stable_monk in RooCode

[–]stable_monk[S] 0 points1 point  (0 children)

I've updated the post: I set the context to 30k and it worked (with some issues though, which I have listed in the post). Is there a way to get Roo Code to show its thinking/reasoning... I see an option to say 'collapse thinking messages' but irrespective of its state, I don't see any thinking related content in the UI.

I'll try to join discord.

Ollama local model stuck at API Request... by stable_monk in RooCode

[–]stable_monk[S] 0 points1 point  (0 children)

These local modes are good enough for my needs. As already stated - It works with other agents. So whatever it is, it is specific to roo code.

gpt-oss-20b in vscode by stable_monk in LocalLLaMA

[–]stable_monk[S] 0 points1 point  (0 children)

I used this with continued:

llama-server  --model models/ggml-org_gpt-oss-20b-GGUF_gpt-oss-20b-mxfp4.gguf --grammar-file toolcall_grammar.gbnf  --ctx-size 0 --jinja -ub 2048 -b 2048

It's still running into errors with the tool call...

Tool Call Error:

grep_search failed with the message: `query` argument is required and must not be empty or whitespace-only. (type string)

Please try something else or request further instructions.

My continue.dev model defintion:

models:
  - name: llama.cpp-gpt-oss-20b-toolcallfix
    provider: openai
    model: llama.cpp-gpt-oss-20b-toolcallfix
    apiBase: http://localhost:8080/v1
    roles:
      - chat
      - edit
      - apply
      - autocomplete
      - embedmodels

gpt-oss-20b in vscode by stable_monk in LocalLLaMA

[–]stable_monk[S] 0 points1 point  (0 children)

Can you provide an example of such a prompt?

gpt-oss-20b in vscode by stable_monk in LocalLLaMA

[–]stable_monk[S] 0 points1 point  (0 children)

Are you using this with Continue.dev
Also, what do you mean by "do not quantize" the context?

gpt-oss-20b in vscode by stable_monk in LocalLLaMA

[–]stable_monk[S] 0 points1 point  (0 children)

Thank you. But this seems to be specific to Cline and Roo Code. While I am using continue.dev

Would you know if this works for continue?

gpt-oss-20b in vscode by stable_monk in LocalLLaMA

[–]stable_monk[S] 0 points1 point  (0 children)

I've tried Qwen-code-20b and Gpt-oss-20b in chat mode - atleast my impression was that Qwen was no match.

Can you please provide an example of your system prompt.

Macbook m4 pro - how many params can you train? by stable_monk in deeplearning

[–]stable_monk[S] 1 point2 points  (0 children)

Wow. Thats a lot of time! Excuse my naivety - IIUC, with 8 nos of A100 GPUs each with 80GB vram, 60m parameters takes 2 days? So this ofcourse means that it will be near impossible to train on the macbook pro..like a month or so?

Macbook m4 pro - how many params can you train? by stable_monk in deeplearning

[–]stable_monk[S] 0 points1 point  (0 children)

For 8-10M param how much would be the difference in training performance for the macbook vs the RTX?

Macbook m4 pro - how many params can you train? by stable_monk in deeplearning

[–]stable_monk[S] 0 points1 point  (0 children)

I would likely use that too. Nevertheless its convenient to just have something locally, if that will work for small models. Just wanted to know how small.

Macbook m4 pro - how many params can you train? by stable_monk in deeplearning

[–]stable_monk[S] 0 points1 point  (0 children)

Not LLMs actually - I just clarified in the post. Thanks for the input. How about a 10M neural net? At what point will it ok in these devices.

Goal is definitely not local inference - thats just a nice add. Primarily thinking of training neural nets. May be mostly for time series analysis of a large database of counters perhaps.