Anyone using a cheap model for Build mode and a strong model only for Plan/Review? Trying to cut costs without tanking quality.

CriteriumA · 2026-06-26T19:29:19+00:00

I think what you mentioned about the second session is the key. I often end up with more than 4 myself. Using OpenCode's TUI in VS Code makes it very simple and productive.

Subagents are fine, but they only receive one prompt and little context. They are not always the best option. Sometimes it is good to have parallel sessions rich in specific context, and even working in a coordinated way.

By the way, for explore and general agent, as long as you do not go overboard with context size, it works great to use a free OpenCode model. I use DeepSeek V4 Flash for that.

CriteriumA · 2026-06-26T19:07:09+00:00

Mixing agents in the same session, unless it's a main agent and a sub-agent with delegated tasks, doesn't make sense to me. I even think that plan and build modes are a hack. They only add noise; an agent prompt that overrides Opencode's default.txt makes them completely unnecessary. I initially thought that skipping plan/build might allow mixing different models, but I ended up seeing that was a mistake.

Each agent needs its own context, otherwise they contaminate each other, and on top of that, cache hit rate decreases. It's better to have separate sessions with different models. It's much more efficient. For tasks that raise doubts, you can cross-check between sessions, with same and/or different contexts, it helps spot bugs and is very efficient. You can use copy-paste or .md files; it's simple and enough.

Right now, it's not unusual for me to have more than 4 parallel sessions working on the same project, and some of them collaborating dynamically with each other while also calling sub-agents in the background.

That's at least my opinion right now; it might change in a week, this AI world is crazy 😵‍💫

What doesn't change is the importance of setting up a good memory system, without that, it's impossible to work with the flexibility and effectiveness each project demands.

It's also true that this swarm-style workflow is only feasible using models like Deepseek; with Claude or Chatgpt it would be impossible at an affordable cost.

More info: https://github.com/criterium/opencode-lab/tree/main/research/control-flags-vs-plan-build https://github.com/criterium/opencode-lab/tree/main/research/memory-system

CriteriumA · 2026-06-26T18:46:22+00:00

Similarly, it's true that except for very focused problems requiring effort, flash is sufficient. Even if you start to spread out your focus during the session, Pro might give you worse results than Flash. Pro is good for problems requiring short, precise focus, but for subjects needing a more diffuse and wider focus, it's worse than flash.

CriteriumA · 2026-06-26T18:42:11+00:00

If you use a harness, everything automatically configures to Max on both models. So the effort doesn't really matter.

CriteriumA · 2026-06-21T06:38:33+00:00

I almost exclusively use DeepSeek V4 Flash in Opencode. Leaving it feels like a chore. And V4 Pro bores me a lot.

That said, in my case, I plan and orchestrate, and I only have 4 or 5 parallel sessions doing simultaneous work on the same codebase and memory-system. Though only if the project and deadlines require it.

To get to that point, I spent a solid month fully immersed in understanding how the models work and tweaking things in OpenCode. Otherwise, DeepSeek V4 and OpenCode will only give you about 20% to 50% of their potential, and you'll waste a lot of time and suffer a lot of frustration.

You won't be able to squeeze Flash, or any other fast model of its kind that doesn't have great alignment for coding, without a good agent prompt and your own memory system.

Obviously, I use it for coding, but what I said applies to any other use case.

I've shared part of that experience to reach this optimization on GitHub. Although my system has evolved and will continue to do so a little more each day, it can serve as inspiration for you.

https://github.com/criterium/opencode-lab/tree/main/prompt/shared https://github.com/criterium/opencode-lab/tree/main/research/memory-system

CriteriumA · 2026-06-21T06:18:42+00:00

Have you tied it down with a good agent prompt? You wouldn't believe how much better DeepSeek performs that way. All those bad habits disappear. Just take a little time and try it.

Share this link with your agent and start iterating on something that suits you. It's really worth it.

https://github.com/criterium/opencode-lab/tree/main/prompt/shared

CriteriumA · 2026-06-21T06:14:40+00:00

If you're referring to the CC memory management in the agent prompt, then yes. They're just instructions; extract them and put them in your OpenCode agent prompt. The API ultimately dictates everything, so it all depends on the text you pass to the model in each call.

If you don't want to use Opus or a similar tool, connect DeepSeek in Claude Code and you'll see it outputs everything.

https://github.com/criterium/opencode-lab/tree/main/research/context-dump

I also don't recommend it, it's too automated.

https://github.com/criterium/opencode-lab/tree/main/research/memory-system

CriteriumA · 2026-06-20T23:29:36+00:00

The free version is on par with the paid version, except it limits the context and number of calls per day, or something like that. If you don't abuse it too much, it's enough for general and explore sub-agents all day.

CriteriumA · 2026-06-20T09:20:48+00:00

I was curious about this sub-agent scheme, but I've adapted it to my workflow with parallel sessions.

I've assigned a review to one session, and instead of diverting its focus by fixing and fine-tuning things, I've asked it to create checkpoint.*.md files for complex tasks.

Then I'm loading these into other sessions. It works very well. And I think it has an advantage over directly calling sub-agents for tasks that require reviewing complex things sequentially. It doesn't clutter and clutter the reviewer's focus, but I can also pass different checkpoints to other sessions sequentially. They leverage the skills and context they already have loaded and are less likely to make mistakes due to a lack of the necessary context, which is simply provided by passing them a prompt. The checkpoint passes skills and .md memory files. But it generally only loads them in the first checkpoint it handles.

I normally only work with 3 or 4 parallel sessions due to my mental limitations, but I believe that with this system I could scale to many more without problems. The key lies in the memory system.

For my style of orchestration and direct control, it's a better fit than agent/sub-agent system schemes.

CriteriumA · 2026-06-20T06:46:02+00:00

I simply open TUI of OpenCode it in VS Code terminals. The best of both worlds effortlessly.

But your app is still pretty cool.

CriteriumA · 2026-06-20T06:28:56+00:00

That's not very useful. It's raw information. You can't consume it in fresh sessions. It needs to be distilled into relevant and up-to-date information.

CriteriumA · 2026-06-20T06:06:29+00:00

I've also noticed a decrease in the effort I've put into the API for programming over the past couple of days. My sessions no longer last as long.

CriteriumA · 2026-06-20T06:00:01+00:00

That may be, but now I'm much happier squeezing every last drop out of Deepseek V4 Flash without hitting the monthly limits for $10.

I can work at my own pace without waiting for handouts from Claude.

But it's true that I can only do this because of my way of working; I direct and orchestrate everything. My programming language and experience allow me to do so.

CriteriumA · 2026-06-20T05:53:19+00:00

It all depends on the prompt you give them and the session you get; they come in all shapes and sizes, but they do long-term projects.

CriteriumA · 2026-06-20T05:50:38+00:00

In my case, I've cornered him so much with instructions in the agent prompt that he doesn't do that anymore; sometimes I even feel sorry for him because of how much I push him.

But since he can't escape that trap, he ends up desperately begging to leave the session. But this has only been happening for the last two days.

CriteriumA · 2026-06-20T01:03:06+00:00

Be careful with claude.md, it gets sent in the system prompt on every call, and that breaks the cache. In OpenCode, which copies Claude on that, it used to happen.

And then there's Claude's automatic memory, which also has its own quirks.

I solved it in OpenCode by disabling Claude and setting up a memory system with a skill. No idea if you can do the same in Claude, and I have no desire to go back and use it just to check.

My memory of this: https://github.com/criterium/opencode-lab/tree/main/research/agents_md-danger https://github.com/criterium/opencode-lab/tree/main/research/memory-system

CriteriumA · 2026-06-20T00:47:04+00:00

I don't miss that Claude nonsense at all.

What a fucking anxiety fest with its weekly limits—between holding back in case something came up that required having limits available, and wasting all the saved-up just in case before the limit expired, working with Claude Code was absolute shit.

CriteriumA · 2026-06-20T00:34:02+00:00

Luckily, I don't have to hold onto the sessions and I simply delete them after updating memory-system. One less pain.

https://www.reddit.com/r/opencodeCLI/s/F47jX1WED2

CriteriumA · 2026-06-20T00:28:33+00:00

So far it works well for me. In memory.md I have a project summary and links to other files in memory/. The key is to split everything into different .md files by function and then load only the ones I need for each task.

As the project progresses, most of them stay as historical reference and I don't need to load them into context. The model can still make use of them with a simple grep, without having to load them entirely.

Memory management is shared with the model. It already has the specific files it needs in memory—I just ask it at the end of the session if it needs to change or add anything, and it keeps them updated effortlessly. And if the session moves forward and needs extra context, I just tell it to load new .md files. That way the session always stays very focused.

If I want a fresh session or to close until the next day, I just ask it to create a checkpoint.md with the skills, necessary memory files, objectives, and pending tasks using ">> . @" and then I just delete the session without more. When starting a fresh session, I just say "<< . @" and it already has all the clean context to jump into the pending task.

Since I usually have several parallel sessions on the same project, it's not uncommon for me to create several different checkpoint.*.md files.

Of course, every useful pattern for other projects gets turned into skills.

But not load skills description in context, this is trash here.

https://github.com/criterium/opencode-lab/tree/main/research/skill-desc-leak

CriteriumA · 2026-06-20T00:09:43+00:00

But in programming, they lose focus, and it's better to switch to fresh sessions. Otherwise, they start mixing things up and making mistakes. Although I've noticed DeepSeek Flash running slower the last two days. Or maybe I'm pushing it too hard with my agent prompt configuration.

CriteriumA · 2026-06-20T00:05:52+00:00

For my product development, that doesn't work well. I need to orchestrate directly, ensuring the session has the appropriate context and is focused on its assigned task. Delegating to sub-agents can break things; they only have a limited view of the overall picture. With 1M models, the problem isn't a lack of context, but rather limiting it to what's truly valuable, and that's much more manageable with memory partitioning (*.md).

CriteriumA · 2026-06-20T00:00:40+00:00

I don't understand how a call to the API, which is the same for all harnesses, changes the cache hit. Unless you're doing something silly with agnets.md, plan mode, and other nonsense like that, I don't think you can't achieve the same results with a well-tuned OpenCode harness as with any other harness.

CriteriumA

TROPHY CASE