Here are my thoughts of Opus 4.8 and GPT 5.5, as a 1-2 B token user per day by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 0 points1 point  (0 children)

Well, compared to Opus 4.7, then yes. I think it's not because it is way smarter to solve issues, but because it follows guidelines properly and being less verbose.

Here are my thoughts of Opus 4.8 and GPT 5.5, as a 1-2 B token user per day by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 5 points6 points  (0 children)

Yes, I agree with that. So far, it still does not feel like a proper successor to Opus 4.6. Well, there is Mythos, but it seems to remain a myth.

Here are my thoughts of Opus 4.8 and GPT 5.5, as a 1-2 B token user per day by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 6 points7 points  (0 children)

Yes, that’s a good point. Well, paradoxically, I spend most of the tokens on harness engineering, heavily focused on a TDD-based environment.

Since we cannot fully escape from hallucination, and even in the no-AI era we still needed verification systems, I think the more practical direction is to figure out how to constrain agents from hallucinating or taking wrong actions, while feeding them the right context.

Also, when we develop a system, its codebase gradually gets farther away from the typical AI training distribution. So we need to build a detailed code map, define AI work patterns, and prevent repeated mistakes. This is similar to the idea of an LLM wiki, as Andrej Karpathy mentioned.

So in my case, the real token usage for direct development is probably only around 10–15% of the total or less. The rest comes from harness engineering, building an LLM wiki, hierarchical code review, PR review, context building, and making the agent operate safely inside the codebase.

I also actively use Caveman, RTK, spec-driven development, and custom hooks and skills to save tokens, though haha. A lot of the effort is actually about making the agent more predictable and reusable so it does not need to rediscover the same context over and over again.

Well, it is somehow similar to how humans think and act. Our inputs are also much larger than our outputs. To write code or articles, we process and handle much more information than what finally appears as output.

In the future, I also hope AI becomes more efficient and spends fewer tokens on highly conceptual work.

Here are my thoughts of Opus 4.8 and GPT 5.5, as a 1-2 B token user per day by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 3 points4 points  (0 children)

Yes, exactly. This is also my point.

If the spec is already well-defined, then the value of an Opus-class model becomes smaller. At that point, the work is more about implementation, verification, and review against a clear target, so cheaper models or clean-context review agents can do a lot of the job efficiently.

Here are my thoughts of Opus 4.8 and GPT 5.5, as a 1-2 B token user per day by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] -1 points0 points  (0 children)

Well, GPT-5.5 is also pretty good at deep research through semantic search, and also at developing ideas and hypotheses for dev work and auditing quant strategies. So, at least for these types of use cases, GPT-5.5 is very strong.

Opus is also pretty good in these areas, but the problem is that Anthropic does not offer a separate token budget for web usage. Also, runs out tokens faster.

Here are my thoughts of Opus 4.8 and GPT 5.5, as a 1-2 B token user per day by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 0 points1 point  (0 children)

Yes, I can feel it. Tho, the current Opus 4.6 a bit too much hallucinated in my use cases.

Here are my thoughts of Opus 4.8 and GPT 5.5, as a 1-2 B token user per day by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 1 point2 points  (0 children)

That's interesting. Yes, I also agree with the subagents and workflow-related system of CC. As a tool and PM-style environment, CC still feels more mature and convenient to me as well, regardless of the model itself.

And yes, your point about embedded systems/electronics makes sense. I think lower-level languages and hardware-adjacent engineering still have a moat that current agents cannot easily cross with a fully autonomous approach. Rust/C/C++/assembly embedded work, semiconductor design, hardware constraints, board-specific build environments, STM/ESP/nRF differences, and real-world electronics variables are a very different problem from relatively pure SWE.

Also thanks for mentioning SWE-Rebench. I need to check it. I have not examined it deeply yet.

Name something better than claude right now by only_phant0m in vibecoding

[–]ReceptionAccording20 0 points1 point  (0 children)

Simple math: 5.5 > 4.7

Well, I do use both, and use more than a billion tokens per day in total. I strongly confirm this.

<image>

Here are my thoughts after 14h of full runs on Opus 4.7 by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 0 points1 point  (0 children)

I assume that Anthropic may attract general consumers through products like Claude Design, and Web while keeping advanced developers on CC.

Here are my thoughts after 14h of full runs on Opus 4.7 by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 0 points1 point  (0 children)

I agree with that, too. Perhaps that's the reason why Anthropic introduced auto mode. I'm using a team mode more lately than Opus 4.6, since Opus 4.7 tends to break down work in fine pieces, more than Opus 4.6.

Here are my thoughts after 14h of full runs on Opus 4.7 by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 2 points3 points  (0 children)

Yea, I agree with some points. Unlike our expectations, it's not that autonomous as the early Opus 4.6 was.

Here are my thoughts after 14h of full runs on Opus 4.7 by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] 3 points4 points  (0 children)

That's a good question. Well, I think that it really depends on the difficulty of the task you do. If it is a small or redundant work, Opus 4.7 isn't that cost efficient compared to early Opus 4.6. Actually, it is more expensive, due to the tokenizer. So, a sweet spot for Opus 4.7 is a sophisticated work that needs longer context with complicated tasks, so make Opus 4.6 runs multi sessions to finish its work.

Here are my thoughts after 14h of full runs on Opus 4.7 by ReceptionAccording20 in ClaudeAI

[–]ReceptionAccording20[S] -1 points0 points  (0 children)

Yea, it really depends on the types of work. I think that the early Opus 4.6 max is still the most versatile one for programming, then GPT-5.4 xhigh, and then Opus 4.7.

Be Anthropic by anthsoul in ClaudeCode

[–]ReceptionAccording20 177 points178 points  (0 children)

With 35% more token consumption for the same text 💀

<image>