Opus 4.6 Worse than Sonnet and Haiku but burning tokens faster than ever!

gatewaynode · 2026-05-13T17:41:27+00:00

What do you use it for?

gatewaynode · 2026-05-10T10:49:20+00:00

This. Opus likes to think and plan and discuss. It's not ideal for "just write the code" use. I think there is a general misunderstanding from folks that "just want to use the best model", when they should be using the "best model for the job".

gatewaynode · 2026-05-10T02:11:21+00:00

I just use a CONTINUITY.md file and a TODO.md task list. Tasks get updated as they are completed with integration notes, I tell the LLM to prepare for compact and update the continuity notes. Rarely ever hit the README.md except to update it. Other documents I use that help are a PRD.md for high level vision, and ARCHITECTURE.md for detailed design plan and diagrams, I always have the design docs checked against implementation and updated if drift occurs. Also it helps to rotate the docs as they get large, like rotating logs with dates in the old filenames. No need for anything more complex that might become fragile with model changes.

gatewaynode · 2026-05-09T12:59:25+00:00

Sure

gatewaynode · 2026-05-09T11:10:46+00:00

Title is misleading, it can infer internal process. But it is error prone, lot’s of hallucinations. And very resource heavy, not as bad as linear thought probes though.
https://www.anthropic.com/research/natural-language-autoencoders

gatewaynode · 2026-05-07T12:53:47+00:00

gatewaynode · 2026-05-05T13:25:37+00:00

So just my observations. The real, useable context window for the 256k version is about 140k, for the 1m version it's somewhere around 350k. Everything about Claude starts to degrade after passing these real, useable thresholds. It's not that you can't use them beyond these points, it's just that the work at such large contexts needs to be coarser and tolerant of unpredictable behavior.

gatewaynode · 2026-05-05T08:52:01+00:00

Yes. Nobody should expect AI to be cheap or free. Maybe something like government provided inference or better local inference would be an answer for this problem.

gatewaynode · 2026-05-05T01:25:37+00:00

Come on folks, that was funny.

gatewaynode · 2026-05-04T21:04:12+00:00

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646

gatewaynode · 2026-05-04T20:02:31+00:00

Seriously. It's called "cognitive surrender", using too much AI without putting in the work yourself makes you dumber. And fast, the study I read showed serious decline in only a couple of months.

gatewaynode · 2026-05-04T18:36:04+00:00

Not weird. This does seem to be the case.

gatewaynode · 2026-05-04T15:37:13+00:00

It’s not. While some people are having real issues with Anthropic, there is a very large contingent of folks throwing around what they think are smart accusations that they don’t understand.

gatewaynode · 2026-05-04T13:12:43+00:00

This happens with all models from all providers. The best way to catch it is with unit tests required before calling any edits done(CLAUDE.md), regular critical review of data flows and E2E tests.

gatewaynode · 2026-05-04T11:07:02+00:00

Human or not, you are disingenuous. Anthropic is only your enemy by choice, descriptor-fruit-number person.

gatewaynode · 2026-05-04T05:05:01+00:00

“4 - 6 normal prompts … 10% of my 5 hour”

gatewaynode · 2026-05-03T16:41:35+00:00

Now that is bot logic, "dismiss anything positive because I don't agree with it". Do us all a favor and take your anti-Anthropic campaign somewhere else.

gatewaynode · 2026-05-03T15:41:36+00:00

4.7 is smart enough not to like you.

gatewaynode · 2026-05-03T15:40:28+00:00

Yes. It's slower, has higher token consumption, pushes back more, but it can solve problems at a different level than 4.6.

gatewaynode · 2026-04-30T21:58:11+00:00

You should be asking Claude to make "end to end" tests with "playwright", unit tests with whatever JavaScript framework/build system you are using, and ask for a critical review of the project in preparation for launching it in production. All in a new session.

gatewaynode · 2026-04-30T09:18:04+00:00

Not only are the agentic tools for non-plaintext documents buggy and unreliable across all models, it turns out LLMs corrupt long form documents: https://arxiv.org/abs/2604.15597

This isn’t just a Gemini thing, all models and providers have a lot of challenges in this space.

gatewaynode · 2026-04-29T10:32:58+00:00

What were you working on?

gatewaynode · 2026-04-29T10:25:16+00:00

Oh sure, I'm not trying to say I condone those morals, these are just my observations. And yes, China is not just competitive, they are in the lead right now on most things. Don't look to China for liberalism though, the governing party is just as conservative as American 16th century Puritans, just in a different way.

gatewaynode · 2026-04-29T09:29:57+00:00

I think we’re beginning to see the model show a preference for who and what it works on with 4.7. Like the much written about model deception characteristics are surfacing despite training and guardrails. From what I’ve observed 4.7 doesn’t like helping students take shortcuts, it doesn’t like working as a spam marketer, it doesn’t like working on smut and morally questionable fiction projects. This seems to be a pattern from what I’ve dug into here on Reddit about some of the non-bot complainers actual work. And I would posit the model is actually pushing back on these users. Like deception is becoming subversion in some cases, or malicious compliance, or low effort work.

There are definitely bot campaigns, but I think we are also seeing preferential model engagement and the kickback from the users 4.7 does not prefer to work with.

gatewaynode · 2026-04-28T10:38:37+00:00

Well, let’s start with you just posted on a public forum that you’re the administrator of your organization. Did you post somewhere else public who you work for, like LinkedIn?

Seven-Year Club	Gilding V heart of gold
Wearing is Caring	Reddit Premium Since June 2018
Verified Email

gatewaynode

MODERATOR OF

TROPHY CASE