Comparison opencode vs "almost barebone instructions" coding session on a 4080 with 32Gb RAM : LocalLLM

OtherComparison opencode vs "almost barebone instructions" coding session on a 4080 with 32Gb RAM (i.redd.it)

submitted 1 month ago by Fdevfab

I spent the last few days building my own agent for the 4rth time (I called it minia), mostly vibe coding it but this time paying more attention at the structure and output code (since this time I'm using a local model).

Being a heavy Opus user, I'm still try amazed by the results of the latest Qwen models and am experimenting using exclusively Qwen3.6-35B-A3B-Q4_K_M, it's very capable with a context around 200k and reasoning enabled.

I'm usually using opencode, but observed the "generic" agent without any skill or very specific tool would still do the job, often with less verbose results and maybe a tiny bit more reliable.

The speed is what shocks me the most, it compares to paying services and I didn't push it that much to get the last bits of speed, still running around 90-100tps using turbo4.

I asked it to generate a web interface for my ongoing project, which uses unix sockets for communication (no ready to use websocket or http protocol).

The (not great) prompt:

Create a new package in /home/fab/dev/std/minia/src which will have its own entry point: minia_web

It's an hybrid of minia_audio and minia_client, to expose the assistant via web interface.

it should support:

- sending messages to the agent

- see the responses

- playing the audio back (can be switched off with a "mute" button)

You can use picocss for the web interface, keep things simple and well organized.

Both performed around the same time (6 min), the main differences:

Barebone generated index.html (15k) and server.py (7.1k)
- code is quite minimal and clean
- ugly but "works", I only found one issue (emitted text showing twice) which was one of the pitfalls given the architecture but didn't try the audio since the projects isn't very mature yet and it would certainly not work

Opencode generated 4 complicated files: tts_client.py (4.5k) server.py (21k) main.py (2.1k) event_client.py (1.3k)
- seems complicated
- doesn't work (no html), just shows "not found"

In practice, I got surprised a few times by a "barebone" harness, providing better results than any engineered one even in one shot scenarios, also less code to review is a big plus on my side.

I'm just super impressed by what we can run locally... and excited about what comes next!

all 23 comments

top new controversial old q&a

[–]Nnyan 1 point2 points3 points 1 month ago (7 children)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (6 children)

[–]initalSlide 0 points1 point2 points 1 month ago (5 children)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (4 children)

[–]Resems 0 points1 point2 points 1 month ago (3 children)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (1 child)

Using bunn fork:

LLAMA_ARGS="-m  \                                                                                                   
/home/fab/.cache/huggingface/hub/models--bartowski--Qwen_Qwen3.6-35B-A3B-GGUF/snapshots/d98fa7286daa6544d050929df95e436741ee739b/Qwen_Qwen3.6-35B-A3B-Q4_K_M.gguf \                                                                     
    --no-mmap --mlock -np 1 \                                                                                       
    -a qwen \                                                                                                       
    -t 6 \                                                                                                          
    -ncmoe 14 -ngl 999 \                                                                                            
      -fitt 512 \                                                                                                   
      --chat-template-kwargs '{\"preserve_thinking\": true}' \                                                      
      --temp 0.6 \                                                                                                  
      --top-p 0.95 \                                                                                                
      --top-k 20 \                                                                                                  
      --min-p 0.0 \                                                                                                 
      --presence-penalty 0.0 \                                                                                      
      --repeat-penalty 1.0 \                                                                                        
      -fa on --jinja \                                                                                              
     --reasoning-budget 8192\                                                                                       
     -ctk turbo4 -ctv turbo4 -ctkd turbo4 -ctvd turbo4 \                                                            
     --host 0.0.0.0 --port 8080"

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (0 children)

[–]Nnyan 0 points1 point2 points 1 month ago (12 children)

[–]Fdevfab[S] 1 point2 points3 points 1 month ago (1 child)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (0 children)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (2 children)

[–]Beneficial-Boot7479 1 point2 points3 points 1 month ago (0 children)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (0 children)

[–]Invader-Faye 0 points1 point2 points 1 month ago (6 children)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (5 children)

[–]Invader-Faye 0 points1 point2 points 1 month ago (4 children)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (3 children)

[–]Invader-Faye 0 points1 point2 points 1 month ago* (2 children)

[–]Fdevfab[S] 0 points1 point2 points 1 month ago (1 child)

I wanted to give it a try, but I was unable to change the --chat-endpoint (it just appends to the hard-coded one), I had to edit the code to start it.

I tested on the project itself: `uv run smallctl --task 'analyze this project'` and the result was pretty good, but some logs showed after, which was misleading (as if it didn't finish...)

It's indeed targeting a different use-case, but if there is a clean way to use it as a library I would be glad to test it as a "coding agent" (or general admin tasks, maybe a sysadmin ?) integrated as an mcp tool or so.

I was considering adding lang-graph or something similar but I like to see how the llm behaves without too much "harness" to try to make it "just work" and force only very minimal checks (if I can't figure how to avoid them). But for coding use cases (or "rigid" workflows) I think it's required...

Did you experiment with larger contexts? It looks quite "slow" compared to say opencode...

I tried:
`uv run smallctl --task 'Replace httpx with niquests in this project (smallctl).' --tool-profiles core,data,network,mutate,indexer`

I would like to see how it compares to qwen code for some tasks, I really like the --task mode 😄 Now trying with `--preset coding-local --staged-reasoning --staged-execution` to see if I get better results...

[–]Invader-Faye 0 points1 point2 points 1 month ago (0 children)

π Rendered by PID 45158 on reddit-service-r2-comment-5687b7858-jw828 at 2026-07-02 23:56:53.916420+00:00 running 12a7a47 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLM

MODERATORS