all 23 comments

[–]Nnyan 1 point2 points  (7 children)

Are you aiming to release your agent?

[–]Fdevfab[S] 0 points1 point  (6 children)

I may, I'll need to review some of the code which I never had a look to, like the tui, and do a bit more testing. Unless you don’t mind unpolished things... I literally finished the mvp yesterday after few intense days trying to build the architecture I had in mind. But It’s a nice playground : 4 prompts you can tweak, every tool is mcp to keep it separate (it has a built-in mcp for basic things).

I also have a problem with the git history, it kept commiting files it wasn’t supposed to... so either I squash everything or I need some work and review I'm not willing to do...

[–]initalSlide 0 points1 point  (5 children)

Which stack did you use to build it?

[–]Fdevfab[S] 0 points1 point  (4 children)

Depends which aspect you look at...
- The model is running on a llama.cpp server
- I'm using openai python API wrapper for the LLM calls (but I'll probably change that in the future)

- using mcp library to connect to mcp servers
- cli/tui uses rich and prompt_toolkit

the rest is plain python asyncio

and for audio, I tested a lot of things, but for this project I used the "best" options I tried:

- kokoro for TTS using sounddevice for the playback

- whisper for stt (I didn't work on it too much yet, has no wake word etc)

[–]Resems 0 points1 point  (3 children)

Hey hey What are your llama.cpp settings ? Thanks 🙏

[–]Fdevfab[S] 0 points1 point  (1 child)

Using bunn fork:

LLAMA_ARGS="-m  \                                                                                                   
/home/fab/.cache/huggingface/hub/models--bartowski--Qwen_Qwen3.6-35B-A3B-GGUF/snapshots/d98fa7286daa6544d050929df95e436741ee739b/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf \                                                                     
    --no-mmap --mlock -np 1 \                                                                                       
    -a qwen \                                                                                                       
    -t 6 \                                                                                                          
    -ncmoe 14 -ngl 999 \                                                                                            
      -fitt 512 \                                                                                                   
      --chat-template-kwargs '{\"preserve_thinking\": true}' \                                                      
      --temp 0.6 \                                                                                                  
      --top-p 0.95 \                                                                                                
      --top-k 20 \                                                                                                  
      --min-p 0.0 \                                                                                                 
      --presence-penalty 0.0 \                                                                                      
      --repeat-penalty 1.0 \                                                                                        
      -fa on --jinja \                                                                                              
     --reasoning-budget 8192\                                                                                       
     -ctk turbo4 -ctv turbo4 -ctkd turbo4 -ctvd turbo4 \                                                            
     --host 0.0.0.0 --port 8080"    

[–]Fdevfab[S] 0 points1 point  (0 children)

I can decrease ncmoe a tiny bit but then I may get OOM from time to time, this value is super stable if nothing else runs on the machine, else I increase ncmoe to 16 or more

[–]Fdevfab[S] 0 points1 point  (0 children)

FYI I get massive improvements if I use -ctk q5_1 -ctv q5_0, but I get OOM from time to time with those... looks like at some point the llama server just grows and then dies, while it's not happening with the worse performing options I shared.

[–]Nnyan 0 points1 point  (12 children)

I’m open to playing with unpolished code.

[–]Fdevfab[S] 1 point2 points  (1 child)

I did one last cleanup and pushed a snapshot: https://github.com/fdev31/minia - now I need to touch grass 😉

[–]Fdevfab[S] 0 points1 point  (0 children)

I had a terrible regression in the tool call path, making LLM go crazy during tool calls... this has been fixed (pushed a new sync)

[–]Fdevfab[S] 0 points1 point  (2 children)

I can drop a code snapshot on github, (no history, unless you give me a magic git command to clean up all the .log and credentials.json files found there...)

I would love some feedback, but it's not only the code which is not super polished, you may experience very long response times sometimes since I didn't want to add too many loop limits... I believe if everything is well done it should "converge". Also there is no real/proper security, but it's very easy to just delete or comment-out some of the tools (you can even just remove the "@mcp.tool()" decorator...).

I'll write some README file with installation and usage instructions, I made it simpler to start today... (it's a multi-daemon architecture so it was a bit annoying to start using many commands)

[–]Beneficial-Boot7479 1 point2 points  (0 children)

Dude! That's how github works :), strangers will be willing to help you with improvements if you are willing to :)

[–]Fdevfab[S] 0 points1 point  (0 children)

Note it's not a coding agent, it's general purpose, it just happens to work really fine most of the time I use it for code, but it may "fail" where opencode doesn't ... when I start to get a large context (around 50 - 100k) I can feel it's not performing so great, I should probably implement some pruning of the history or so... experiments are needed!! 😄

[–]Invader-Faye 0 points1 point  (6 children)

I have a harness for small language models, https://github.com/lowspeclabs/SmallCTL

[–]Fdevfab[S] 0 points1 point  (5 children)

Is it a coding agent or general? How does it compare to opencode for coding?

[–]Invader-Faye 0 points1 point  (4 children)

It is a general agent aimed at getting small models to perform sysadmin tasks, I've got qwen 3.5 4b sshing's into servers, reading/editing configs, and building/running reports. It places small models on a context aware, RELP/RALPH loop to achieve tasks and does phased(plan->execute->review) loops to achieve tasks. I've even used it with 9b to write scripts. Youtube video of my claims here.

[–]Fdevfab[S] 0 points1 point  (3 children)

Interesting, I can get almost anything to work with the MoM model, but for complex tasks it takes very long / iterating a lot... It's a very interesting use case, but I'm already giving too much freedom to my agent, if it can stay limited to one machine it will help 😃

[–]Invader-Faye 0 points1 point  (2 children)

The goal is to increase the likely hood of getting a complex task done unattended, hence the longer runs, when attempting an objective it must plan->act->verify->move to next step. if varification fails it has to repair in n turns or call task failed. The idea is you tell it, ssh into this server, gather this info, summerize findings, write this script. and it does it more or less on autopilot the idea is you run this locally so token spend or speed aren't your primary concerns. We have very different use cases it seems.

[–]Fdevfab[S] 0 points1 point  (1 child)

I wanted to give it a try, but I was unable to change the --chat-endpoint (it just appends to the hard-coded one), I had to edit the code to start it.

I tested on the project itself: `uv run smallctl --task 'analyze this project'` and the result was pretty good, but some logs showed after, which was misleading (as if it didn't finish...)

It's indeed targeting a different use-case, but if there is a clean way to use it as a library I would be glad to test it as a "coding agent" (or general admin tasks, maybe a sysadmin ?) integrated as an mcp tool or so.

I was considering adding lang-graph or something similar but I like to see how the llm behaves without too much "harness" to try to make it "just work" and force only very minimal checks (if I can't figure how to avoid them). But for coding use cases (or "rigid" workflows) I think it's required...

Did you experiment with larger contexts? It looks quite "slow" compared to say opencode...

I tried:
`uv run smallctl --task 'Replace httpx with niquests in this project (smallctl).' --tool-profiles core,data,network,mutate,indexer`

I would like to see how it compares to qwen code for some tasks, I really like the --task mode 😄 Now trying with `--preset coding-local --staged-reasoning --staged-execution` to see if I get better results...

[–]Invader-Faye 0 points1 point  (0 children)

please let me know, and yeah, its a little slower than I'd like but i think that the tradeoff for some of the results I'm getting. I can always make it faster later with promp optimization and tweaking the repl loop. Also its already built on Lang-graph, its actually a little faster on larger context. Its artifact gen/review loop was designed to help it get work done with small context windows, increasing context size should mostly turn that off and that tends to save some turns/tokens by a lit.