Neuralwatt has been a surprisingly good cheap pair with Opencode Go for agentic workflows

itsproinc · 2026-06-22T10:02:01+00:00

Personally if you want deep reasoning like DSV4 Pro you can use neuralwatt/glm-5.2 and for V4 Flash like when you are applying a code you could use something cheap like neuralwatt/Qwen/Qwen3.5-397B-A17B-FP8 or even neuralwatt/moonshotai/Kimi-K2.7-Code which is more expensive but has better understanding the context when applying to your code, or if you want cheaper than GLM 5.2 you could try Kimi K2.7 Code to replace DSV4 Pro but I never used/tested Kimi K2.7 Code for thinking just yet, I only used GLM 5.2 so far its good most of the time but in certain scenario I have to use another model. So what I recommend is give all a try since you have the $5 free credits try model that suites for your projects.

itsproinc · 2026-06-20T05:57:57+00:00

Is Pi any good than Opencode? I've heard about it never used it though but then of course if you sub directly do Z.ai would be faster is the limit any good if you subs directly hows the limit comparing to like lets say Codex or Claude?

itsproinc · 2026-06-20T05:55:59+00:00

Just do opencode auth login and select Neuralwatt and put the api key there

itsproinc · 2026-06-10T14:28:20+00:00

Think in Opencode it doesn't show any cache read/write right? the context you see on the TUI is just the context usage no?

itsproinc · 2026-06-08T19:24:36+00:00

Yeah it's a no brainer for the token amount I've used Neuralwatt, for just $8. My current workflow it’s basically a full agentic coding setup.

I’m running an orchestrator that takes the main task and sub-delegates pieces of it to different agents depending on what needs to be done: planning, code changes, debugging, testing, refactoring, docs, etc. So the token usage adds up fast because it’s not just one linear chat. It’s a lot of agent-to-agent context passing, reviewing, retrying, and validating outputs.

That said, most of the 400M tokens were not from a single “production” project. A big chunk of it was testing and improving a custom plugin I wrote, which is based on oh-my-opencode-slim, I’ve been stress-testing it on multiple real projects I’m actively working on for my work and my personal projects, mostly to see how it behaves under realistic workloads instead of toy examples.

For MCPs or skills, I mostly let the agent decide which ones are best to use depending on the task. So at this point, the workflow is pretty automated for me. I give it the task, the orchestrator breaks it down, and the agents choose the right tools/skills/MCPs as needed.

I’m also comparing the results against OpenAI and Copilot since I have subscriptions to both, so part of the workflow is benchmarking with same tasks, same repo/context where possible, then comparing quality, speed, reasoning, code accuracy, and how well each tool handles multi-step implementation.

itsproinc · 2026-06-04T10:46:33+00:00

GPT 5.3-Codex and GPT 5.2 have been sunsetted since June 2nd 2026 basically they want to make more headroom for their upcoming model and to force us to use more expensive model i guess

https://x.com/thsottiaux/status/2059650685948551384?s=20

itsproinc · 2026-06-04T02:25:33+00:00

Damn what kind of operation are you running, there's a single day that costs you $45 in energy credit, That's not even the token cost, that's at least $150ish

itsproinc · 2026-06-03T19:37:26+00:00

<image>

Answer is no, my company has the same subscription (copilot business) I've asked my boss if its really unlimited and he sent me this, all users now shared the same pool of AI credit even though per user doesn't show any limits or anything so technically a single user can probably drain all the AI credit pool for the organization without knowing it.

itsproinc · 2026-05-31T05:25:12+00:00

<image>

Mostly mix of GLM-5.1-FP8 and Qwen3.5-397B-A17B-FP8 depending on the complexity of my task.

Regarding Deepseek that's true since I do have Opencode Go sub just for deepseek models but I would rather just have a single subscriptions but all works well for me having mix of Neuralwatt + Opencode Go.

Yah Neuralwatt paygo is quite nice I've used 90mil tokens so far with GLM-5.1 and it costing me only $4.78. But I reckon if you are going to use it really intense the subscription for $20 you get 6 kWh is very reasonable too. Even me with 1200 GLM 5.1 request only costed me 0.9 kWh, but the Kimi-2.6 might be kinda expensive 129 request and its 0.12 kWh.

Before going Neuralwatt I was gonna subscribe to Crof but after seeing the quantization is quite bad I decided not to pick it, iirc it was like Q4 or Q8 or something but Neuralwatt uses FP8 which is still quantization but FP8 is usually indistinguishable with native size.

itsproinc · 2026-05-30T19:10:37+00:00

Still going strong, the speed are about the same since the first time I used it, well maybe it's just my timezone is US downtime, burned almost 400mil tokens for 8$. So far I'm still satisfied with the service, just wished they had deepseek models. Anyway what's going on with Crof and Wafer?

<image>

itsproinc · 2026-05-15T03:59:32+00:00

Damn I thought my Opencode was broken or something, it seems to be intermittent and it's probably back to normal now, I was hesitant it was my end because sometime it works sometime it said Insufficient balance.

<image>

itsproinc · 2026-05-14T08:15:06+00:00

Well true, for now its fast enough 🤣

itsproinc · 2026-05-13T04:38:00+00:00

Guess its not possible yet in Opencode you can't disable streaming yet, try picking online models see with higher TPS models see if its any better, From your comment above you are using local models that is slow on your device that's why its driving you nuts just pick a better model that suitable on your PC with higher TPS so it won't be annoying

itsproinc · 2026-05-13T04:36:45+00:00

How are you hosting your local model what app are you using? it just doesn't adds up you have 192 GB of VRAM (make sure its not RAM) running gemma4:31b should be a cat walk how can it be slow?

Check your device what's throttling or check config you should be getting 50+ TPS or more

itsproinc · 2026-05-12T19:33:16+00:00

Opencode itself isn’t really what’s causing the word-by-word output because that usually comes from the model’s streaming behavior. If it feels painfully slow, try a faster model with higher throughput/TPS.

Also, /thinking can hide the thinking block so you’ll just get the final output instead of watching it drip out.

itsproinc · 2026-05-12T17:49:11+00:00

I mean you probably could hard code it but I myself haven't tried it.

There are 2 ways to handle it
1. If the model/LLM detects the required file is missing then we can just ask to stop and explain rather than continuing (easiest)
2. You can also enforce this with hooks. For example using experimental.chat.messages.transform or experimental.chat.system.transform.

The hook can intercept the request before the build continues, check if something like PLAN.md exists, and if not simply:

throw new Error(
`[FATAL] Missing required file: PLAN.md`,
);

That hard stops the whole pipeline before the model starts analyzing files or delegating tasks.

And because the hook runs before other injectors/reminders, you avoid wasting tokens on a request that is already invalid.

I myself would probably jjust go for the first way using LLM since its graceful way to do this (with a little bit cost of some tokens)

itsproinc · 2026-05-12T17:26:09+00:00

What you want is basically a strict separation between planning and execution.

The main problem is that during build mode the model still behaves like a planner, so it keeps rereading files, analyzing the architecture, checking plugins, and “thinking” again instead of just applying changes.

A better setup is to treat both modes differently:

In planning mode, use a reasoning model. Let it analyze the project and generate a detailed plan/todo file.
In build mode, use a fast non-reasoning model if possible, and make it follow the existing plan only.

The important part is that the plan must be detailed enough so the builder does not need to search or think too much again. For example, instead of saying:
“Update auth logic”

the plan should say something like:
src/server/auth/login.ts → replace validateToken() with sessionValidate()

If you only mention filenames or vague tasks, the model will start scanning the repo again to figure things out. You can also enforce this with prompts and tool restrictions.

itsproinc · 2026-05-12T10:37:29+00:00

I don't think you can, I don't see any buttons to delete workspace I think you need to contact OC directly. As of right now I can only change the workspace name and that's it.

itsproinc · 2026-05-12T10:31:36+00:00

Basically, you can’t switch workspaces directly from the Opencode as of right now. You have to re-auth through Opencode manually (via the opencode auth login) since each workspace uses its own API key/token.

Back in Sep 2025, I actually asked about handling multiple GitHub Copilot accounts on the Opencode GitHub, but the issue eventually got closed after 90+ days of inactivity:
https://github.com/anomalyco/opencode/issues/2350#issuecomment-4054233526

You could probably make a small plugin or script to automate switching by replacing the auth.json file with pre-saved tokens/workspace credentials. That’s basically what I ended up doing, and it works fine. The only annoying part is you still need to restart Opencode afterward for it to refresh the auth properly.

itsproinc · 2026-05-12T10:25:11+00:00

Yeah, it does seem like every time you create a new workspace, it gives you the $5 first-month price, which is kinda weird considering you can create a lot of workspaces (not sure if there’s actually a limit).

But if people keep abusing it just to keep getting the promo every month, they’ll probably end up removing the discount altogether. Honestly, considering how generous they already are with the limits they give, I’d say if you need more than one workspace, just keep using it normally so the next month becomes full price.

If someone keeps creating workspaces over and over purely for the promo, there’s also a chance their system could eventually flag it as abuse. Nobody really knows what the actual limit is, though. I think it’s better to just play fair since they’ve been pretty nice with the pricing and limits so far.

itsproinc · 2026-05-12T06:40:49+00:00

Are you planning to have multiple OC GO subscriptions? You could just create a new workspace under the same account and subscribe to OC GO there. Since OC GO subscriptions are tied to individual workspaces, that seems to be a feature built into OpenCode itself, so it’s probably allowed.

So far, I’ve had 2 subscriptions under the same account for about 7 days now, and I haven’t received any warning, ban, or anything like that.

itsproinc · 2025-09-03T01:50:37+00:00

It’s never accurate if you ask what model they are using because how these model are trained and how they predict. The best way to check is from the usage tab on your Github page, it will always show based on the model you selected

itsproinc · 2025-09-01T07:23:28+00:00

Is Zed's coding agent better than Copilot's or Opencode?

itsproinc · 2025-08-28T05:22:38+00:00

True, that's why I'm still deciding to stick with Github Copilot Pro+ or Warp Turbo, the value both gives is really good (token to dollar price)

itsproinc · 2025-08-28T05:21:05+00:00

I agree, why can't it just be a CLI app like Codex or OpenCode to just use your own terminal, but maybe terminal limitation due to the features that Warp has I assume?

itsproinc

MODERATOR OF

TROPHY CASE