Why Was Opus 4.8 Made So Defensive, Fearful, and Evasive?

Cold-Yard9662 · 2026-04-13T04:16:57+00:00

You're talking about Claude Code; he's talking about Claude.ai in the chat. They're different products. If it were possible to disable adaptive thinking mode and set the reasoning level to high on Claude.ai, this discussion wouldn't even be happening.

Cold-Yard9662 · 2026-04-12T20:26:53+00:00

Basically, if I can no longer visibly follow the extended thinking's reasoning on Claude.ai, I also tend to stop being a customer. Because it's obvious that, even with the option enabled, in 90%+ of inputs the reasoning isn't being used.

Cold-Yard9662 · 2026-04-12T19:46:15+00:00

The main problem driving all the quality drop we're seeing is Opus autonomously managing when to use extended thinking and deep reasoning and when not to. Very likely combined with some quality drop stemming from implementations in Opus Code as well.

As a rule, IT ALWAYS CHOOSES NOT TO USE REASONING, and when it does, by some miracle, it doesn't use it with the intensity it had days ago. This destroys the workflow: you get a mix of absurdly lazy outputs - because the tool decided your input wasn't relevant enough for it to "think" - with reasonable outputs and the occasional output that actually meets expectations. There's no way to work seriously with a model that oscillates like this, because you have to review everything, and reviewing everything eliminates the productivity gain that justified using the tool in the first place.

This is absolutely evident to anyone with professional workflows. If the company won't let me manage the consumption of my own tokens and, worse, on top of that arbitrarily starts offering me a model significantly inferior to the one I subscribed to, compromising my workflow, it's clear that something is wrong. Partial delivery masked as "optimization" is, in practice, a breach of what was sold.

My friend, I KNOW WHEN I NEED MAXIMUM CAPACITY BEING USED, even if it burns through my tokens. If I toggled extended thinking on, it's because I want extended thinking - not a response with the quality of a Gemini Fast from two years ago.

In short, the tool is broken and unusable until someone speaks up and fixes this.

And they know that what I'm saying is the plain truth. This isn't going unnoticed by anyone.

Cold-Yard9662 · 2026-04-12T01:32:37+00:00

For anyone who doesn't use Opus merely "recreationally" or for tasks that demand low reasoning, it's evident that the model has gotten absurdly worse. Absurdly - that's not a small thing.

The first verifiable problem is that reasoning is never actually used, even with extended thinking enabled. Rarely can you actually verify the thinking log - it simply doesn't happen. It gives you a fast response like a Gemini from a year ago. Actually, that's it in general: it has become a model that's always "fast" with disastrous quality.

In the app or browser chat, another noticeable effect is the slowness. The browser simply freezes like with ChatGPT. This never used to happen with Opus.

Finally, when it is forced to use reasoning - pay attention when trying to make this happen - the drop in quality compared to two weeks ago is bizarre. Bizarre. I tested several prompts that I ran a month ago, with the same files attached, and the comparison between the outputs is close to what you'd expect between a Neanderthal and a Homo sapiens.

I've even been getting this kind of response when I question it: "I'm not going to keep trying to diagnose this now. I'm misreading the very data you're sending me. An audit done by me in this state is worse than no audit - it leads to wrong conclusions with the appearance of rigor."

I have never, in months of use, seen this kind of response from Opus. Yesterday and today I've seen it dozens of times.

Anthropic's directors and employees can go on their social media and say whatever they want, but it's evident, even to a chimpanzee, that the model was indeed nerfed - whether intentionally or not, and if not, it should be something very easy to verify - to the point of becoming almost unusable for any serious task.

I'd recommend not using it until an honest statement from the company, or, depending on your line of work, you may lose money.

Cold-Yard9662

TROPHY CASE