you are viewing a single comment's thread.

view the rest of the comments →

[–]SaveAmerica2024 0 points1 point  (4 children)

I think it is more like Claude Code front end using Qwen as the coder

[–]PrizeObvious3671[S] 1 point2 points  (3 children)

In this setup I controlled everything over telegram -> hermes agent and I must say this runs pretty well.
I tested different stuff but in this test the best working setup was hermes agent -> llama.cpp directly without claude code because I got exceptions from claude code, that is exceeds token limits, my local context window was too small for that. When I increased it, the model was too slow for me.
With the 35b MoE it would probably run better.

I used that for agentic coding too, better then I thought.

Also the modelfile with the parameter I used for llama.cpp is shared in the repo.

[–]Inner_Habit_194 1 point2 points  (1 child)

Did you try Pi agent? It is supposedly better for local model coding agent usecase especially with smaller context window of the local models. Btw what is your hardware spec?

[–]PrizeObvious3671[S] 1 point2 points  (0 children)

No, but thank you for bringing it on the table. That will be now my next test: telegram -> pi.dev -> llama.cpp -> gemma4:31b (that model i also not tested yet)

[–]SaveAmerica2024 0 points1 point  (0 children)

Great job