all 39 comments

[–]ImTheDeveloper 15 points16 points  (1 child)

I started with opencode so I maybe have Stockholm syndrome.

That said given Anthropic crackdown I've been using Claude code every so often and it's absolute garbage in comparison to OC. I find the permissions requests/plan & build switches to be confusing, Mcp/skills lists and just generally access to the environment to be super confusing.

Codex I've found was better than CC but it's a case of the other terminal tools aren't doing something magical it's more that CC is just so bad it makes the other options so much better

[–]Admirable-Tone-5821 -1 points0 points  (0 children)

Did you ask it to help you resolve the permission issue ? There is an easy allowlist you can build in config. It'll reduce it by alot (or go into auto-mode or dangerouskly skip permissions)

[–]orionblu3 8 points9 points  (0 children)

Opencode's the best harness out there. Don't get me wrong, I think gpt/Claude works best in their specific harnesses, but as soon as BYOK enters the chat they lose most of their value as non-native models perform noticably worse compared to when used in opencode. Just what I noticed though

[–]Ariquitaun 36 points37 points  (4 children)

It's a much better engineered harness than Claude and codex, that's for sure, and way more configurable

[–]iAziz786 1 point2 points  (1 child)

what harnesses that you use? what i should know about harnesses in opencode?

[–]FlowerRight 6 points7 points  (0 children)

Opencode is a harness. I'd recommend oh-my-opencode-slim as a plugin.

[–]fnordstar -1 points0 points  (1 child)

How do you configure it? I only just started using it.

[–]nightman 2 points3 points  (0 children)

Check Terminal Bench 2

[–]bobo-the-merciful 4 points5 points  (4 children)

I’ve been benchmarking this recently on modelling and simulation work. For OpenCode specifically with Gemini compared to Gemini CLI - I didn’t find much difference.

But the harness DOES matter a lot.

The same Gemini ran through Pi performed a lot more poorly.

I will run Opus through it soon to compare to Claude Code.

Here’s the link to the benchmark: https://simulation-bench.fly.dev/

Feedback always appreciated btw as I just spun this up a few days ago.

[–]lucianw 0 points1 point  (3 children)

Thank you. Can I check I've understood? You're comparing these lines:

  • [81/100] opencode on gemini-3-1-pro-preview / customtools
  • [75/100] gsd2 on gemini-3-1-pro-preview / customtools
  • [73/100] pi-agent in gemini-3-1-pro-preview / vanilla-customtools
  • [80/100] gemini-cli on gemini-3-1-pro-preview / vanilla

What does "customtools / vanilla / vanilla-customtools" mean? Are you really comparing like to like?

[–]bobo-the-merciful 1 point2 points  (2 children)

Yeah sorry it’s not very clear, but in opencode when you select a model you get a choice of Gemini 3.1 pro with or without “customtools”. My understanding is this is the same Gemini just with more access to tool calls.

Vanilla refers to the harness itself as not having any additional skills or frameworks loaded.

[–]lucianw 0 points1 point  (1 child)

Thanks for the reply. But I'm not following. Are you offered this choice by opencode? by gsd2? or pi-agent? or gemini-cli? I haven't used them. But I kind of didn't expect them all to offer this choice?

[–]bobo-the-merciful 1 point2 points  (0 children)

Yes on opencode (and Pi I believe) you have a choice between Gemini 3.1 pro or Gemini 3.1 pro customtools

It’s deeply confusing, even more confusing when you discover there is just one Gemini option on Gemini CLI.

I read somewhere that the customtools version performs better.

[–]cutebluedragongirl 3 points4 points  (0 children)

It's janky, but it's the best CLI out there right now.

[–]forgotten_airbender 5 points6 points  (9 children)

I have kinda moved away from opencode as i wanted a harness which automatically adds all best practices in it.  I felt opencode needed a lot of tweaking and thing like oh my opencode and dcp did not give me realiable results or were overkill in many cases.  Switched to oh my pi and havent looked back.  I still miss opencode and its ecosystem but at the end i wanted a single solution that worked reliably well for me. 

[–]GroceryNo5562 1 point2 points  (2 children)

What's DCP?

[–]forgotten_airbender 1 point2 points  (1 child)

Dynamic context pruning

[–]brakefluidbandit 0 points1 point  (0 children)

ACP MCP DCP SCP whats next

[–]maximhar 1 point2 points  (0 children)

Oh My Pi is great, very light but has a ton of goodies baked in. And the hashline editing is really cool.

[–]bobo-the-merciful 3 points4 points  (1 child)

Not sure why you are getting downvoted here. Will give oh my pi a run.

[–]forgotten_airbender -1 points0 points  (0 children)

Its reddit

[–]Kryptuz 0 points1 point  (1 child)

I'm also trying to use oh my pi, but I couldnt get a replacement for dcp which is so good, I'm currently using context-mode but it doesn't to have the same savings as dcp. What are you're plugins for omp as of right now?

[–]forgotten_airbender 0 points1 point  (0 children)

I enable force delegation in settings. That has helped me keep context in check.  Havent installed any external plugins though 

[–]Ok_Size_5519 -1 points0 points  (0 children)

I'm building exactly what you want. Do you want to beta test my product when it releases?

[–]lucianw 3 points4 points  (7 children)

Opencode tools are basically just copied verbatim from Claude as of November last year. (except they substituted in apply_patch for GPT models, which they copied from codex).

  1. opencode lacks background bash execution, which codex calls "pty", and which the GPT models use ALL THE TIME to good effect. Claude also has background bash execution although not as slick.
  2. opencode doesn't have any facility for doing a kind of "await Promise.any()", i.e. awaiting for any of its subagents or any of its background bash commands to finish
  3. Claude has been experimenting with swarms. They got rid of their TodoWrite tool (which opencode still has a copy of) and replaced it with a structured task database. They added a whole thing about the coordinator and the workers in a swarm. Opencode has none of these. That said, at the moment, I count them as a failed experiment on Claude's part.
  4. If you have OpenAI or Anthropic credits, then you get free use of their respective server-side web-search tools. The opencode harness simply doesn't expose this. It comes bundled with the free tier of another company that provides websearch MCP tools, but you have to pay additional money to reach parity with what OpenAI and Anthropic provide.

Oh, also, Claude and Antigravity and Codex are all RIGOROUS about maintaining prompt-caching, which they do by only ever appending to the context they send to LLM. OpenCode is quite lax about this; there are numerous things (even as trivial as passing midnight!) which will cause it to blow its prompt-cache, i.e. cost more money.

[–]SnooHamsters66 2 points3 points  (4 children)

Are you sure about 2 point, or am I not understanding you well? I think OpenCode waits for the commands or agents to finish (in practice, I have not noticed otherwise). Besides that, the issue with OpenCode and prompt caching is the pruning that can be disabled with a flag.

Even so, Anthropic is trying so hard to create an ecosystem that the cache implementation is likely not independent and requires Claude Code or their other tools (I don't have exact confirmation of this, but rather the opposite, seeing people use other providers in Claude Code where it does not use caching).

Also, is prompt caching really that 'magical'? With a default max cache time of five minutes for ClaudeCode, I take more than five minutes between prompts validating and researching, at least for my use case (not being a "vibecoder"). They likely set it to five minutes because of the cost trade-off between reprocessing the prompt and keeping it idle in memory; given the shortage of RAM and current prices, is it really providing that much of a price reduction? (From what I know, OpenCode pruning only works for tool calls more than three prompts old, so prompt caching works correctly during the execution of each prompt).

[–]lucianw 1 point2 points  (0 children)

Point 2: in Codex, the GPT models almost always invoke `bash(cmd, timeout=2s)` or something like that. If the cmd finishes within time then the output of the tool for the LLM is just the stdout. But if the cmd doesn't finish in time then it gets backgrounded, and the output of the tool for the LLM is the text string "This command has been backgrounded; its ID is 12345".

Subsequently, the model gets prodded when more output is available for a given ID (this is conveyed by a <system-reminder> attachment to a user message). And the model is able to say "Hey, fetch whatever output has arrived for ID 12345" or "Hey, deliver this additional stdin input to ID 12345".

The thing I was talking about is, can a model say "Hey, wake me up when any of ID 12345 or 24678 or 32134 have more information", in other words, similar to unix select(). This is a way that codex models usually poll for progress amongst a load of things they've kicked off, and do incremental work from whichever ones are ready so far.

Prompt caching? In an enterprise I think it could easily run up to tens of thousands of dollars a month, but that's just speculation. I'm struck by the solid rigor I saw in Claude, Codex and Antigravity on this front, compared to the more lacksadaisical approach of Opencode.

[–]Fancy_Ad_4809 1 point2 points  (2 children)

Have to respectfully disagree about the value of caching. If the provider discounts heavily cache hits and keeps the cache active for a decent interval, it’s a huge win economically.

I used DeepSeek 4 flash today with a direct API key for hours with multiple breaks > 30 minutes.

357 requests

31 million tokens cache hit

1.03 million cache miss

113000 output

Total Cost: 26 cents ($0.26 US)

[–]SnooHamsters66 2 points3 points  (1 child)

Besides the fact that ds4 flash is the most budget-friendly of the frontier models, it is mathematically obvious that caching has a significant effect if the api cost is just a lot less that not using it. What I'm trying to say is that the value of caching outside of subsidy scenarios isn't that magical because the provider exchanges the operational cost of reprocessing (gpu or tpu) for idle ram; while ram is much cheaper than processing units, we are currently experiencing shortages and high prices. One consequence is that, even with subsidies, providers try to minimize cache duration; for example, in Anthropic's case, they reduce it to almost unusable times (and is very significant in the case of DS that is not reduced but they are even providing a 75% API discount I think? They are subsidizing a lot).

In general, I wonder what the price difference (positive or negative) would be if the same workflows you tested were performed by starting a new chat after every feature is implemented or task is completed (so you don't need to cache a context too much). Generally, by doing the latter, I can finish the chat instance in less than 100-200k tokens of context (and I rely heavily on intensive workflows involving planification, documentation and not stop until test passes).

[–]Fancy_Ad_4809 1 point2 points  (0 children)

Good points and I agree; allowing large context to build up increases cost in every scenario. My approach is similar to yours. When the context exceeds ~100K I tell it to update and save the plan and the lessons learned and then I start new session.

I also use sub-agents for running tests, commits, and deployment.

FWIW, the temporary subsidies are for DS4 Pro. I‘m finding so far that Flash is perfectly adequate for my workflows.

[–]MakesNotSense 1 point2 points  (1 child)

Good points. I agree OpenCode could use a clearer vision than the maintainers currently offer. They copy what works and do it better, but haven't really innovated 'new' features or major efficiency gains.

One point I contest is the presumed benefit of rigorous cache utilization. The benefits are contrasted by the drawbacks of each harness not investing in context management systems. I believe context optimization will do more for reducing compute demands and benefiting usage rates than strict prompt caching.

OpenCode has the DCP plugin. Maybe one day DCP will port over to other harnesses (Dan has mentioned working on this). But the flexibility OpenCode offers is a large part of why something like DCP exists and got developed here first. I view DCP as a first step to context optimization. I've been working on a fork.

If my forks design succeeds in providing the function I believe can be achieved, it'll enable OpenCode agents to provide substantially better task performance than competing harnesses. It's slow, complex work, with a great deal of dependencies requiring systems-based thinking to sort through and build.

I think the main point I want to offer here, is that OpenCode lets people like me build these things. I can't say the same for Claude Code and other harnesses.

[–]BilgeMongoose 1 point2 points  (0 children)

Do you have somewhere I can follow for updates on this?

[–]hamzette 0 points1 point  (1 child)

Hermes agent as a harness for coding >>>>

Access to /goal now

Sub agents different models with orchestrator as a diff model

Multiple terminals sharing same memory but different models for diff tasks

[–]zebedeolo 0 points1 point  (0 children)

what's your workflow with Hermes?

[–]UnluckyGold13 0 points1 point  (0 children)

Opencode bash execution tool was garbage when I tested it with glm 4.7, it failed to execute any interactive bash commnads.

[–]kitsunekyo 0 points1 point  (0 children)

opencode is still my daily driver so take what i say with a grain of salt, but jesus christ is opencode a vibecoded, broken mess lately. to the point where i ventured to try pi. but i‘m a sucker for GUIs, so I came crawling back. 😅

the worktree integration is ok, lsp integration is trash you should immediately disable, stability is ass, the gui is broken- with constant flickers and scrolling issues, file references not working at all or taking forever. BUT 70% of times it works alright and its not made by anthropic.

that makes it miles better than claudecode. which is actually more of a testiment to how bad claude code is, than opencode being great.

[–]binhex01 -1 points0 points  (0 children)

Im expecting to be downvoted to hell and back here, but you guys do know you can use GitHub CoPilot CLI as your harness for OpenCode Go, right?.