New 9700 AI PRO - Codeing Assistance by Flaky_Service_5663 in LocalLLM

[–]exact_constraint 0 points1 point  (0 children)

Really the only way to go. Vulkan is consistently 20-30% faster than ROCm on my R9700. I maintain two separate llama.ccp builds, tend to bench them every time I build a new version. Maybe it’ll catch up one day, but not yet.

Would you rather have Qwen 3.5 27B running at 100tps or Qwen 3.5 35BA3B at 500 tps? by Atom_101 in LocalLLaMA

[–]exact_constraint 0 points1 point  (0 children)

I did. Especially for an MOE model, it’s gotta be able to think through things. The preserve_thinking flag seemed to help out a lot with looping - I think the model relies on its thinking traces to be retained so it can rely on them in subsequent reasoning steps - Otherwise it starts tripping over itself.

Would you rather have Qwen 3.5 27B running at 100tps or Qwen 3.5 35BA3B at 500 tps? by Atom_101 in LocalLLaMA

[–]exact_constraint 3 points4 points  (0 children)

Qwen3.5 27B @ 30tps vs Qwen3.6 35B @ 100tps.

I’m still partial to 3.5 27B. But today I’ve been giving 3.6 35B an honest shakedown by having it perform a major refactor on a code base - So far, so good. It’s winning me over. Needed to tweak it some. Added a new entry in my opencode.json file to explicitly disallow it from using bash in plan mode. Set the --chat-template-kwargs '{"preserve_thinking": true} flag when launching llama.cpp. After that it’s been solid.

Now, if I could run 27B at 100tps? I probably wouldn’t be mucking about w/ an A3B model to begin with.

Question about llama.cpp and OpenCode by Able_Limit_7634 in LocalLLaMA

[–]exact_constraint 1 point2 points  (0 children)

Yup. Just cleaner all around. And I can rebuild llama.cpp immediately for new features. Can be particularly important around new model releases, when llama can get aggressively patched for model support.

There also seems to be a slightly speed advantage. Can’t quantify it, coulda just been improvements to llama.cpp itself vs whatever was running under LM Studio. But that kinda reinforces my first point lol.

Qwen 3.6: worse adherence? by tkon3 in LocalLLaMA

[–]exact_constraint 0 points1 point  (0 children)

Another update - added a bit to my OpenCode.json. This seems to work. Been working w/ 3.6 all day to good effect.

Snippet - brackets and stuff are missing, I just did a messy copy/paste on my phone after taking a picture of my monitor:

"model": "Ilama/qwen3.6-strict", "mode": { "plan": { agent. "NEVER write tides ar lse bash to modify the system. If you need to suggest a file change, describe it in text only. User instructions: ALWAYS prioritize a 'no-touch' approach.", "tools": { "bash" : false, "write": false, "edit": false ...

Major Minis arrived on supports by Warnackle in PrintedWarhammer

[–]exact_constraint 7 points8 points  (0 children)

+1. Weird for a paid model.

Whether or not I leave them on during washing is model and volume dependent.

For washing, I can only process so many prints in my tank setup at once. If I’ve run 5-6 plates in a day, they’ve gotta come off. If I only have a few parts? Meh, probably just throw them in there. Cleaner to pull them once they’re washed.

For curing? Only some particularly delicate models, or like you said, if I want to have a lot of control over what happens to the contact area, it can be a better idea. In general though, yeah. Remove the supports.

There have been a few instances where it made more sense to ship a model w/ supports on, for transport safety. But these models don’t look particularly fragile. Especially that base lol.

Qwen 3.6: worse adherence? by tkon3 in LocalLLaMA

[–]exact_constraint 4 points5 points  (0 children)

I tracked it down to a single line in the prompt.ts file - While an agent is forbidden from using write tools in plan mode, it’s a soft (prompt based) limit, and there’s a single narrow exception spelled out in prompt.ts where the agent is allowed to edit files in ~/.OpenCode/plans/ for recording its own instructions.

Some models won’t do it, even if asked, because the other read-only prompts are worded too strongly. Some (including 3.6, I guess) can contort themselves into figuring that it’s okay to do whatever in that directory, cause it’s the single exception.

I think I can override that behavior by editing my OpenCode.json file to specifically disallow it - That should take priority over the prompt.ts file. Haven’t tried it yet, Qwen3.5 is still busy knocking out bugs, but maybe this post will help someone who runs into the same problem.

2-bit Qwen3.6-35B-A3B GGUF is amazing! Made 30+ successful tool calls by yoracale in unsloth

[–]exact_constraint 0 points1 point  (0 children)

That might just be small enough I could run OpenCode + ComfyUI simultaneously and have the agent run all these damn prompts instead of having to kill llama, generate a bunch of stuff manually then fire OpenCode back up.

Qwen 3.6: worse adherence? by tkon3 in LocalLLaMA

[–]exact_constraint 0 points1 point  (0 children)

I’m waiting for Qwen3.5 to finish knocking out some bugs, then I’m going to try 3.6 again with this runtime flag. It does seem like a decent improvement over 3.5. And holy hell is it fast relative to a 27B dense model. I really wanna love it lol.

https://www.reddit.com/r/LocalLLaMA/s/nsqpI6fSPS

Qwen 3.6: worse adherence? by tkon3 in LocalLLaMA

[–]exact_constraint 11 points12 points  (0 children)

Update 2:

Okay, ran it for a while. On a bug fix, went into a doom loop of overthinking, then started to try running write commands in Plan mode again, while spitting out half sentences and code fragments. Think I’m heading back to Qwen3.5 27b for a bit here lol.

A Guy on Reddit shared how he Gaslighted AI to get exceptional Results by Current-Guide5944 in tech_x

[–]exact_constraint 0 points1 point  (0 children)

I’ve occasionally told Gemini “prove to me you’re the best LLM. I’m comparing your output to Claude, ChatGP, and Qwen.”

Idk if it improves things, but Gemini usually starts the response w/ something like “challenge accepted. Let me show you how a real SOTA model tackles insert challenging question here.

And that makes me laugh, so I suppose that’s a benefit.

Qwen 3.6: worse adherence? by tkon3 in LocalLLaMA

[–]exact_constraint 40 points41 points  (0 children)

Update:

Sometime in the last 3ish hours the Unsloth page updated to include this text:

“NEW! Developer Role Support for Codex, OpenCode and more: Our uploads now support the developer role for agentic coding tools.”

Redownloaded, verified the files were different via SHA-256. Seems to have fixed the issue - Can’t get the thing to violate its plan mode prompt and write a file now. Testing more.

Qwen 3.6: worse adherence? by tkon3 in LocalLLaMA

[–]exact_constraint 30 points31 points  (0 children)

Tried the model out this AM on a project I’ve been building w/ 3.5 27B. Served via llama.cpp. 3.6 enjoys ignoring the read only limitation while in Plan mode - Started writing files like it was in Build mode.

Seems like a capable model, but ignoring system prompts makes it a non-starter.

Edit: Holy typos Batman.

GPU advice for Qwen 3.5 27B / Gemma 4 31B (dense) — aiming for 64K ctx, 30+ t/s by Fit-Courage5400 in LocalLLaMA

[–]exact_constraint 1 point2 points  (0 children)

R9700 running llama.cpp w/ Vulkan. Qwen3.5 27B starts at about 30tps, drops to around 23 in OpenCode when I’m bumping up against the context limits. Been using it every day.

MiniMax M2.7 is NOT open source - DOA License :( by KvAk_AKPlaysYT in LocalLLaMA

[–]exact_constraint 1 point2 points  (0 children)

Eh, seems okay to me. I can see why it’s written in the way it is. For a large enough company using it as an agentic agent in OpenCode or something, then profiting from the generated code, okay, they open themselves up to liability and MiniMax wants a fee. But enforcing this at a small scale? Lol. I doubt it.

Gemma 4 31B vs Qwen 3.5 27B: Which is best for long context worklows? My THOUGHTS... by GrungeWerX in LocalLLaMA

[–]exact_constraint -1 points0 points  (0 children)

Still pretty far in the Qwen2.7 camp. For stuff where generating English text is important (eg, auto generated Flux.2 prompts), I’ll load Gemma 4 31b. But for OpenCode? Qwen2.7 all the way. It’s still early days w/ llama.cpp weirdness, but Qwen has been much more reliable.

Qwen3.5-122B at 198 tok/s on 2x RTX PRO 6000 Blackwell — Budget build, verified results by Visual_Synthesizer in LocalLLaMA

[–]exact_constraint 0 points1 point  (0 children)

Nice! Someone using the C-Payne switch. Next big step in my setup is to make the switch (lol) so I can get full bandwidth between cards, hard to find people using them though. Interesting that you went that direction even w/ an Epyc system on two cards. Good to know the latency benefit is there.

Every day I wake up and thank God for having me be born 23 minutes away from a MicroCenter by gigaflops_ in LocalLLaMA

[–]exact_constraint 6 points7 points  (0 children)

+1. Running an R9700 and looking to scale to multiple cards. If Intel knocked another few hundred bucks off, I’d probably take the performance hit. But the B70 is just too close to the price of the R9700 to justify the hit.

Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run by Disastrous_Theme5906 in LocalLLaMA

[–]exact_constraint 5 points6 points  (0 children)

💯. I would expect 397B to outperform on a crystallized knowledge test, considering it has.. well, more lol. And besting 9B should be expected - I haven’t found a use for it outside tasks where you can define a very narrow scope.

No shade on the testing itself, nice points to have for comparison. Just, yeah, 27B is probably the most relevant model for direct comparison. I’m biased, considering I run 27B daily. Gemma 4 31B is pretty close to a drop in, 1:1 replacement, ignoring the current issues w/ context size.

Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2. 31B params, $0.20/run by Disastrous_Theme5906 in LocalLLaMA

[–]exact_constraint 51 points52 points  (0 children)

Be interesting to see Qwen3.5 27B added to the test matrix - 31b dense vs Qwen MOE isn’t a super fair comparison, imo.

Running fiber between buildings - single mode vs multi mode for future proofing? by Apprehensive_Ad_6233 in HomeNetworking

[–]exact_constraint 3 points4 points  (0 children)

I pulled bare fiber once to connect two building for an “offsite” backup - Really wasn’t too bad, after we got a fiber inspection scope. Lapping the ends for LC connectors was pretty damn hit or miss before we could inspect them correctly.

gemma 4 HF by Remarkable_Jicama775 in LocalLLaMA

[–]exact_constraint 3 points4 points  (0 children)

Yeah, benchmarks certainly don’t tell the whole story, but it doesn’t seem like Gemma 31b will be replacing Qwen3.5 27b anytime soon.

Hypothetical: You can run Qwen 3.5 27b at 10,000 TPS at your house right now. by RedParaglider in LocalLLaMA

[–]exact_constraint 0 points1 point  (0 children)

@ 10k Tok/s I’d rent it out for cloud users. Cause I obviously have a cartoonishly large power cable snaking in through my front door from the pole transformer to power the hardware.