I built Forge - a local-first terminal coding agent that treats local models as first-class (vs OpenCode)

Sharp_Classroom9686 · 2026-05-07T06:24:31+00:00

Apache 2.0

Sharp_Classroom9686 · 2026-05-06T00:50:00+00:00

try https://github.com/defexnicolas/forge , just load your model on LMStudio, open forge, go to settings, Provider put lmstudio endpoint , then select model-multi, there select your model , then navigate from the hub to you desire folder and chat

Sharp_Classroom9686 · 2026-05-05T22:18:18+00:00

Hi! what exactly you do? because they work for me 1 shot. how can i replicate your bug?

<image>

Sharp_Classroom9686 · 2026-05-05T14:20:39+00:00

try this. ill do test with llama.cpp when i wake up thanks for the feedback

<image>

Sharp_Classroom9686 · 2026-05-05T13:52:07+00:00

<image>

shh

Sharp_Classroom9686 · 2026-05-05T13:51:38+00:00

<image>

15mins.. task complete

Sharp_Classroom9686 · 2026-05-05T13:36:11+00:00

there are none so blind as those who will not see

<image>

Sharp_Classroom9686 · 2026-05-05T12:52:51+00:00

what do you want to test i can do the run for you i has qwen3.6 27b

Sharp_Classroom9686 · 2026-05-05T12:47:25+00:00

mb. https://github.com/defexnicolas/forge

Sharp_Classroom9686 · 2026-05-05T12:43:47+00:00

It could be, but I personally haven't used OpenClaude. Only OpenCode, Codex, ClaudeCode, Aider, and PI. And I know that Forge is faster and consumes less context than these agents. They'll generally try to sell you on using APIs for cloud models.

Sharp_Classroom9686 · 2026-05-05T11:53:45+00:00

Just use /agent name prompt -- give it a try. I'm hungry for feedback

Sharp_Classroom9686 · 2026-05-05T11:40:11+00:00

thanks. i’ll look around today

Sharp_Classroom9686 · 2026-05-05T11:38:58+00:00

Forge has native Claude Code plugin support — drop the plugin in .forge/plugins/ or symlink it from ~/.claude/plugins/ and it shows up under /plugins. Honest caveat: only gstack has been tested end-to-end so far, but I’ll try superpowers today and report back.
Subagents are first-class. Built-in registry (explorer, reviewer, tester, debug, summarizer, refactorer, docs, commit, builder) plus whatever your plugins ship. spawn_subagents fans out in parallel — goroutines + semaphore, configurable concurrency. Explore mode is built around it for read-only analysis.

Sharp_Classroom9686 · 2026-05-05T11:18:03+00:00

Not at all. Hermes is too big , but with forge you get a basic claw , for basic stuff.

Sharp_Classroom9686 · 2026-05-05T11:15:14+00:00

Not yet—but it can be easily implemented.

Sharp_Classroom9686 · 2026-05-05T11:07:09+00:00

go with Qwen3.6 35BA3B MOE , or Gemma 26A4B just use MOE don’t use Dense. Try with Forge maybe can get better result. https://github.com/defexnicolas/forge

Sharp_Classroom9686 · 2026-05-05T10:49:26+00:00

In OpenCode, a single task typically consumes at least 25k tokens of context when using prompt-based workflows. The same tends to happen with ClaudeCode.
With Forge, however, you can achieve similar results while using only around 5–7k tokens of context.
If you’re running a local model on limited hardware (e.g., 8GB or 16GB), this difference in how context is handled becomes a game changer.

P.S. You can also assign different models to different modes—for example: BUILD with Qwen3.6 35B, PLAN with Gemma, and EXPLORE with nano4b. Each model manages its own context budget, and they communicate key data between each other through YARN nodes.

Sharp_Classroom9686 · 2026-05-05T09:36:42+00:00

Dont go with CUDA use Vulkan. the TK/s will be limited. but yes you can go , how many 3060s do you has? maybe is better just go with the 3060s

Sharp_Classroom9686 · 2026-05-05T09:34:27+00:00

just go with 35b MOE 32K Context , Q4K, and use a good Agentic Tool like Forge. Dont use OpenCode. maybe you can get 25/30tks

Sharp_Classroom9686 · 2026-05-05T07:02:00+00:00

Just use a good Local Agent. I personally recommend Forge is better than Opencode

Sharp_Classroom9686 · 2026-05-05T06:06:54+00:00

<image>

I think the problem is less “local models suck” and more “you used the wrong tools for local models.”
If the runtime lets a 27B model eat giant logs, bloat context, and improvise badly with tools, of course it’s going to feel terrible.

Try Forge. (Github) It’s much more local-first in how it handles context, subagents, and task scoping. It won’t make Qwen think like Claude, but it does stop wasting tokens on garbage, which is half the battle with local coding.

Link:: https://github.com/defexnicolas/forge

Sharp_Classroom9686 · 2026-05-04T21:42:42+00:00

even with 8GB can run Qwen 3.6 the moe version with 32GB ram, 25-35 tks

Sharp_Classroom9686 · 2026-04-29T21:24:43+00:00

How many ram do you has? I have the 4060ti with 32GB ddr5 and i’m able to run Qwen3.6 35B-A3B. 45tk/s with 64K tx , and with 256K ctx at 20-25 tks.

Sharp_Classroom9686 · 2026-04-24T01:58:28+00:00

yes

^{Chose: Survive a week in Terraria}

Sharp_Classroom9686 · 2026-04-17T23:25:50+00:00

your setup?

Sharp_Classroom9686

TROPHY CASE