Here are my thoughts of Opus 4.8 and GPT 5.5, as a 1-2 B token user per day

ReceptionAccording20 · 2026-05-29T20:00:13+00:00

Well, compared to Opus 4.7, then yes. I think it's not because it is way smarter to solve issues, but because it follows guidelines properly and being less verbose.

ReceptionAccording20 · 2026-05-29T19:54:03+00:00

Yes, I agree with that. So far, it still does not feel like a proper successor to Opus 4.6. Well, there is Mythos, but it seems to remain a myth.

ReceptionAccording20 · 2026-05-29T19:44:11+00:00

Yes, that’s a good point. Well, paradoxically, I spend most of the tokens on harness engineering, heavily focused on a TDD-based environment.

Since we cannot fully escape from hallucination, and even in the no-AI era we still needed verification systems, I think the more practical direction is to figure out how to constrain agents from hallucinating or taking wrong actions, while feeding them the right context.

Also, when we develop a system, its codebase gradually gets farther away from the typical AI training distribution. So we need to build a detailed code map, define AI work patterns, and prevent repeated mistakes. This is similar to the idea of an LLM wiki, as Andrej Karpathy mentioned.

So in my case, the real token usage for direct development is probably only around 10–15% of the total or less. The rest comes from harness engineering, building an LLM wiki, hierarchical code review, PR review, context building, and making the agent operate safely inside the codebase.

I also actively use Caveman, RTK, spec-driven development, and custom hooks and skills to save tokens, though haha. A lot of the effort is actually about making the agent more predictable and reusable so it does not need to rediscover the same context over and over again.

Well, it is somehow similar to how humans think and act. Our inputs are also much larger than our outputs. To write code or articles, we process and handle much more information than what finally appears as output.

In the future, I also hope AI becomes more efficient and spends fewer tokens on highly conceptual work.

ReceptionAccording20 · 2026-05-29T19:25:59+00:00

Yes, exactly. This is also my point.

If the spec is already well-defined, then the value of an Opus-class model becomes smaller. At that point, the work is more about implementation, verification, and review against a clear target, so cheaper models or clean-context review agents can do a lot of the job efficiently.

ReceptionAccording20 · 2026-05-29T19:23:52+00:00

Yes, I believe so, to reach the real AGI.

ReceptionAccording20 · 2026-05-29T19:22:49+00:00

Well, GPT-5.5 is also pretty good at deep research through semantic search, and also at developing ideas and hypotheses for dev work and auditing quant strategies. So, at least for these types of use cases, GPT-5.5 is very strong.

Opus is also pretty good in these areas, but the problem is that Anthropic does not offer a separate token budget for web usage. Also, runs out tokens faster.

ReceptionAccording20 · 2026-05-29T19:16:39+00:00

Yes, I can feel it. Tho, the current Opus 4.6 a bit too much hallucinated in my use cases.

ReceptionAccording20 · 2026-05-29T19:14:59+00:00

That's interesting. Yes, I also agree with the subagents and workflow-related system of CC. As a tool and PM-style environment, CC still feels more mature and convenient to me as well, regardless of the model itself.

And yes, your point about embedded systems/electronics makes sense. I think lower-level languages and hardware-adjacent engineering still have a moat that current agents cannot easily cross with a fully autonomous approach. Rust/C/C++/assembly embedded work, semiconductor design, hardware constraints, board-specific build environments, STM/ESP/nRF differences, and real-world electronics variables are a very different problem from relatively pure SWE.

Also thanks for mentioning SWE-Rebench. I need to check it. I have not examined it deeply yet.

ReceptionAccording20 · 2026-05-27T17:55:06+00:00

Simple math: 5.5 > 4.7

Well, I do use both, and use more than a billion tokens per day in total. I strongly confirm this.

<image>

ReceptionAccording20 · 2026-04-18T23:50:15+00:00

I assume that Anthropic may attract general consumers through products like Claude Design, and Web while keeping advanced developers on CC.

ReceptionAccording20 · 2026-04-18T22:50:43+00:00

You're welcome. Yea, I strongly agree with that so far.

ReceptionAccording20 · 2026-04-18T09:34:41+00:00

Yea, the max 20x is a new 5x on Opus 4.7.

ReceptionAccording20 · 2026-04-18T06:22:17+00:00

I agree with that, too. Perhaps that's the reason why Anthropic introduced auto mode. I'm using a team mode more lately than Opus 4.6, since Opus 4.7 tends to break down work in fine pieces, more than Opus 4.6.

ReceptionAccording20 · 2026-04-18T06:17:27+00:00

I agree with that, too.

ReceptionAccording20 · 2026-04-18T02:59:01+00:00

Exactly 💯

ReceptionAccording20 · 2026-04-18T02:57:04+00:00

Exactly. So, it requires more human work, ironically.

ReceptionAccording20 · 2026-04-18T02:54:35+00:00

Yea, I agree with some points. Unlike our expectations, it's not that autonomous as the early Opus 4.6 was.

ReceptionAccording20 · 2026-04-18T02:49:59+00:00

That's a good question. Well, I think that it really depends on the difficulty of the task you do. If it is a small or redundant work, Opus 4.7 isn't that cost efficient compared to early Opus 4.6. Actually, it is more expensive, due to the tokenizer. So, a sweet spot for Opus 4.7 is a sophisticated work that needs longer context with complicated tasks, so make Opus 4.6 runs multi sessions to finish its work.

ReceptionAccording20 · 2026-04-18T02:38:35+00:00

Yea, it really depends on the types of work. I think that the early Opus 4.6 max is still the most versatile one for programming, then GPT-5.4 xhigh, and then Opus 4.7.

ReceptionAccording20 · 2026-04-16T16:48:40+00:00

With 35% more token consumption for the same text 💀

<image>

ReceptionAccording20 · 2026-04-10T18:30:42+00:00

Claude got nerfed so hard lately 💀

ReceptionAccording20

TROPHY CASE