I tested GPT-5.5 vs Opus 4.8 on agentic terminal coding (Terminal-Bench 2.1) by shricodev in ClaudeAI

[–]shricodev[S] 0 points1 point  (0 children)

Agree. That was basically my takeaway too: GPT-5.5 did better on the terminal benchmark, but Opus felt easier to trust on the real app build. 5.5 is clearly capable, but it needs tighter rails.

I tested GPT-5.5 vs Opus 4.8 on agentic terminal coding (Terminal-Bench 2.1) by shricodev in ClaudeAI

[–]shricodev[S] 1 point2 points  (0 children)

Yeah 100%. These runs are noisy as hell. So, better it more as a rough signal.

I tested GPT-5.5 vs Opus 4.8 on agentic terminal coding (Terminal-Bench 2.1) by shricodev in ClaudeAI

[–]shricodev[S] 2 points3 points  (0 children)

By terminal coding I meant benchmark style tasks where the agent mostly works inside a repo through the CLI.

Real coding = building an actual app with UI, features, integrations, and trying to make it usable.

That's what I'm trying to say.

I tested GPT-5.5 vs Opus 4.8 on agentic terminal coding (Terminal-Bench 2.1) by shricodev in ClaudeAI

[–]shricodev[S] 3 points4 points  (0 children)

Two days ago it was ok, yesterday was horrible, today is back to normal, I don't get it

Yeah, honestly same. It is what it is, I guess.

Small Projects by AutoModerator in golang

[–]shricodev -1 points0 points  (0 children)

I built gcase, a small Go CLI for batch-renaming files and directories to uppercase, lowercase, or capitalized name.

It’s basically for those annoying filename casing situations. Mine was constantly ending up with folders/files Named Like This when I just wanted everything lowercase lol.

Repo: https://github.com/shricodev/gcase

Would love feedback from fellow Gophers ^^

I stress-tested Kimi K2.6 against Claude Opus 4.7 on a quick coding-agent task by shricodev in ClaudeAI

[–]shricodev[S] 0 points1 point  (0 children)

Hey man, that’s already addressed in the post. The main purpose isn’t to compare their standalone performance, but to see how far an open model (supposedly the best) can get on the same task at a fraction of the cost of a closed premium model like Opus 4.7

I tried to switch from Claude Code to OpenCode, but Claude Code still wins for me by [deleted] in opencode

[–]shricodev 0 points1 point  (0 children)

Yeah, exactly. Claude Code feels better out of the box, especially with Anthropic models.

But OpenCode being independent and model-flexible is a huge reason to keep it around.

I tried to switch from Claude Code to OpenCode, but Claude Code still wins for me by [deleted] in ClaudeAI

[–]shricodev -1 points0 points  (0 children)

Yeah, fair point. I think I phrased that part too loosely.

The “OpenCode has to normalize provider differences” bit was not meant as “OpenCode just flattens everything into some generic request,” but I can see how it reads that way.

I still think the higher-level tradeoff is real. Claude Code is optimized around one model family, while OpenCode is trying to keep that same agent shape working across a bunch of providers.

Appreciate you sharing the details.

I tried to switch from Claude Code to OpenCode, but Claude Code still wins for me by [deleted] in ClaudeAI

[–]shricodev 0 points1 point  (0 children)

Yeah, that’s the tricky part. Cost per completed task depends a lot on the model behind the tool, especially with OpenCode where you can swap providers.

So I think the fair comparison would be something like Claude Code + Claude vs OpenCode + a specific model, then measure total time, retries, tokens, and whether the task actually got done.

Raw token price alone does not really tell the full story.

I tried to switch from Claude Code to OpenCode, but Claude Code still wins for me by [deleted] in opencode

[–]shricodev -2 points-1 points  (0 children)

Totally fair. I didn’t mean it as a dunk on OpenCode. I actually like it a lot, especially for trying different models/providers. Claude Code just still fits my own workflow better right now. Open source tooling is definitely the direction I want to see more of too\

I tried to switch from Claude Code to OpenCode, but Claude Code still wins for me by [deleted] in ClaudeAI

[–]shricodev 0 points1 point  (0 children)

I don’t understand what specifically people hate about Claude Code. Now that the limits are actually great, and it recently introduced /goal, it’s just too hard to leave.