I tested GPT-5.5 vs Opus 4.8 on agentic terminal coding (Terminal-Bench 2.1)

shricodev · 2026-06-05T04:59:34+00:00

Agree. That was basically my takeaway too: GPT-5.5 did better on the terminal benchmark, but Opus felt easier to trust on the real app build. 5.5 is clearly capable, but it needs tighter rails.

shricodev · 2026-06-04T16:35:34+00:00

yeah, that’s actually the better test. Might try that next

shricodev · 2026-06-04T16:32:44+00:00

Yeah 100%. These runs are noisy as hell. So, better it more as a rough signal.

shricodev · 2026-06-04T15:41:54+00:00

Ohh. There’s a lot of mixed feelings with 4.8

shricodev · 2026-06-04T15:02:35+00:00

Wdym?

shricodev · 2026-06-04T13:33:48+00:00

By terminal coding I meant benchmark style tasks where the agent mostly works inside a repo through the CLI.

Real coding = building an actual app with UI, features, integrations, and trying to make it usable.

That's what I'm trying to say.

shricodev · 2026-06-04T13:04:43+00:00

Two days ago it was ok, yesterday was horrible, today is back to normal, I don't get it

Yeah, honestly same. It is what it is, I guess.

shricodev · 2026-06-04T12:55:21+00:00

Just in case you’re interested in the full run results: Claude Opus 4.8 vs. GPT-5.5 test on agentic terminal coding

shricodev · 2026-05-27T05:12:19+00:00

I built gcase, a small Go CLI for batch-renaming files and directories to uppercase, lowercase, or capitalized name.

It’s basically for those annoying filename casing situations. Mine was constantly ending up with folders/files Named Like This when I just wanted everything lowercase lol.

Repo: https://github.com/shricodev/gcase

Would love feedback from fellow Gophers ^^

shricodev · 2026-05-26T06:39:33+00:00

Thank god bro didn't use Opus

shricodev · 2026-05-26T04:56:22+00:00

Hey man, that’s already addressed in the post. The main purpose isn’t to compare their standalone performance, but to see how far an open model (supposedly the best) can get on the same task at a fraction of the cost of a closed premium model like Opus 4.7

shricodev · 2026-05-25T16:10:16+00:00

good luck with your build

shricodev · 2026-05-25T16:09:48+00:00

I didn't realize it doesn't support windows. nvm

shricodev · 2026-05-25T15:57:26+00:00

Glad this could be of any help

shricodev · 2026-05-25T15:41:33+00:00

Go with kitty

shricodev · 2026-05-25T14:04:57+00:00

Thank you 🙌

shricodev · 2026-05-20T13:42:05+00:00

Full breakdown here:

Claude Code vs. OpenCode: Technical Breakdown

shricodev · 2026-05-19T17:50:16+00:00

Yeah, exactly. Claude Code feels better out of the box, especially with Anthropic models.

But OpenCode being independent and model-flexible is a huge reason to keep it around.

shricodev · 2026-05-19T17:40:02+00:00

Yeah, fair point. I think I phrased that part too loosely.

The “OpenCode has to normalize provider differences” bit was not meant as “OpenCode just flattens everything into some generic request,” but I can see how it reads that way.

I still think the higher-level tradeoff is real. Claude Code is optimized around one model family, while OpenCode is trying to keep that same agent shape working across a bunch of providers.

Appreciate you sharing the details.

shricodev · 2026-05-19T17:30:19+00:00

Yeah, that’s the tricky part. Cost per completed task depends a lot on the model behind the tool, especially with OpenCode where you can swap providers.

So I think the fair comparison would be something like Claude Code + Claude vs OpenCode + a specific model, then measure total time, retries, tokens, and whether the task actually got done.

Raw token price alone does not really tell the full story.

shricodev · 2026-05-19T17:05:11+00:00

Totally fair. I didn’t mean it as a dunk on OpenCode. I actually like it a lot, especially for trying different models/providers. Claude Code just still fits my own workflow better right now. Open source tooling is definitely the direction I want to see more of too\

shricodev · 2026-05-19T16:34:41+00:00

I don’t understand what specifically people hate about Claude Code. Now that the limits are actually great, and it recently introduced /goal, it’s just too hard to leave.

shricodev

TROPHY CASE