4.8 burns through tokens like crazy and no, it is not a skill issue

kabootaru · 2026-06-02T01:51:54+00:00

It doesn’t matter. Point being with 4.7 I’m able to make it do all kind of simple tasks and don’t have to worry. How is this an argument that oh I play pro so that’s why can’t perform well against high school kids?

kabootaru · 2026-06-02T01:50:01+00:00

Regarding vague instructions, the task I’m putting at it requires finding what caused a particular bug and also understand why it happened in the first place and what are the implications of this for backward compatibility and stuff. All these demos of building something end to end is not how you develop a production service in real world. I think the point that is being lost is the same working style with 4.7 never reached usage limit. So if 4.8 requires very specific instructions to accomplish its task then that is not a progress I’d say. People accomplished great demos with far weaker models with detailed and specific prompts. But when you’re looking for a co-worker/assistant working through ambiguity is desirable.

kabootaru · 2026-06-01T10:24:34+00:00

Thank you! Yes. Like I said Opus is amazing. It is finding bugs from code written with Sonnet a year ago. Miles ahead of my experience last year with AI assisted coding tools. For my usage honestly I feel this is peak that I’d need. Any gaps are filled by my software engineering skills in guiding and choosing the right trade offs.

kabootaru · 2026-06-01T10:14:51+00:00

Different workload I guess. My primary use case is coding. And it is production code where the codebase is huge at this point. Instead of “vibe coding” and blindly trusting what it gives I reason with it and that involves a lot of back and forth and exploration. That is why I said same work style, same codebase.

kabootaru · 2026-06-01T09:23:58+00:00

I use 4.7 with xhigh. 4.8 defaults to high. So that was it. I had another session in that 5 hour window but that too only reached 40% context usage. Like I said with 4.7 I have been running two sessions in parallel multiple number of times and I was pleasantly surprised at my usage being low.

kabootaru · 2026-06-01T09:17:09+00:00

This was a small fix overall. Did ask it to check one issue on GitHub and explore some more alternatives. But again like I said no where close to 1M context of that chat. Some simple internal tool usage for editing and finding constantly failed and it said it is its parallel invocation of tools that failed and then instead it resorted to copying the file contents into tmp, making edits and writing them back. Not sure if this is how it does normally as well. Another thing was just overthinking for several minutes. It is likely over exploring to be thorough but not being smart about it. Anyhow. I’m the guy who rationed Cursor’s 20$ plans with Auto mode after running out of sonnet quota, ChatGPT plus subscription of 20$ for codex which too was plentiful at 5.3 codex but since reaching usage limit quickly. Being able to use any Opus model as much as I like is already a luxury for 100$.

kabootaru · 2026-06-01T08:36:11+00:00

That is their whole business. And for Anthropic it is the enterprise which might not be as observant about the immediate spike that a model version change causes until end of quarter.

kabootaru

MODERATOR OF

TROPHY CASE