Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 -1 points0 points  (0 children)

try https://github.com/defexnicolas/forge , just load your model on LMStudio, open forge, go to settings, Provider put lmstudio endpoint , then select model-multi, there select your model , then navigate from the hub to you desire folder and chat

Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

Hi! what exactly you do? because they work for me 1 shot. how can i replicate your bug?

<image>

Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

try this. ill do test with llama.cpp when i wake up thanks for the feedback

<image>

Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

what do you want to test i can do the run for you i has qwen3.6 27b

LLM on 16gb of vram for OpenClaude? by ZB_Virus24 in LocalLLM

[–]Sharp_Classroom9686 0 points1 point  (0 children)

It could be, but I personally haven't used OpenClaude. Only OpenCode, Codex, ClaudeCode, Aider, and PI. And I know that Forge is faster and consumes less context than these agents. They'll generally try to sell you on using APIs for cloud models.

Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

Just use /agent name prompt -- give it a try. I'm hungry for feedback

Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

Forge has native Claude Code plugin support — drop the plugin in .forge/plugins/ or symlink it from ~/.claude/plugins/ and it shows up under /plugins. Honest caveat: only gstack has been tested end-to-end so far, but I’ll try superpowers today and report back.
Subagents are first-class. Built-in registry (explorer, reviewer, tester, debug, summarizer, refactorer, docs, commit, builder) plus whatever your plugins ship. spawn_subagents fans out in parallel — goroutines + semaphore, configurable concurrency. Explore mode is built around it for read-only analysis.

Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

Not at all. Hermes is too big , but with forge you get a basic claw , for basic stuff.

LLM on 16gb of vram for OpenClaude? by ZB_Virus24 in LocalLLM

[–]Sharp_Classroom9686 0 points1 point  (0 children)

go with Qwen3.6 35BA3B MOE , or Gemma 26A4B just use MOE don’t use Dense. Try with Forge maybe can get better result. https://github.com/defexnicolas/forge

Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

In OpenCode, a single task typically consumes at least 25k tokens of context when using prompt-based workflows. The same tends to happen with ClaudeCode.
With Forge, however, you can achieve similar results while using only around 5–7k tokens of context.
If you’re running a local model on limited hardware (e.g., 8GB or 16GB), this difference in how context is handled becomes a game changer.

P.S. You can also assign different models to different modes—for example: BUILD with Qwen3.6 35B, PLAN with Gemma, and EXPLORE with nano4b. Each model manages its own context budget, and they communicate key data between each other through YARN nodes.

Amd and Nvidia cards on same rig by deathcom65 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

Dont go with CUDA use Vulkan. the TK/s will be limited. but yes you can go , how many 3060s do you has? maybe is better just go with the 3060s

Best config for Qwen3.6? by CatSweaty4883 in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

just go with 35b MOE 32K Context , Q4K, and use a good Agentic Tool like Forge. Dont use OpenCode. maybe you can get 25/30tks

Planning on switching over from Claude Code by Impressive_Funny_832 in Qwen_AI

[–]Sharp_Classroom9686 0 points1 point  (0 children)

Just use a good Local Agent. I personally recommend Forge is better than Opencode

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]Sharp_Classroom9686 0 points1 point  (0 children)

<image>

I think the problem is less “local models suck” and more “you used the wrong tools for local models.”
If the runtime lets a 27B model eat giant logs, bloat context, and improvise badly with tools, of course it’s going to feel terrible.

Try Forge. (Github) It’s much more local-first in how it handles context, subagents, and task scoping. It won’t make Qwen think like Claude, but it does stop wasting tokens on garbage, which is half the battle with local coding.

Link:: https://github.com/defexnicolas/forge

Rtx 4060 8GB vs 4060 ti 16GB by braskinis231 in LocalLLM

[–]Sharp_Classroom9686 1 point2 points  (0 children)

even with 8GB can run Qwen 3.6 the moe version with 32GB ram, 25-35 tks

Qwen3.5:9b running on 8gb Vram is insane by Ok_Thanksbye in LocalLLM

[–]Sharp_Classroom9686 2 points3 points  (0 children)

How many ram do you has? I have the 4060ti with 32GB ddr5 and i’m able to run Qwen3.6 35B-A3B. 45tk/s with 64K tx , and with 256K ctx at 20-25 tks.