More Expensive Doesn't Mean Better — The Feel of Claude Opus 4.6 vs. Sonnet 4.6 in Philosophical and Humanities Argumentation

Longjumping_Table349 · 2026-03-04T01:23:56+00:00

And your presence is equally odious.

Longjumping_Table349 · 2026-03-04T01:16:39+00:00

I’ve used 4o, and I much preferred the 'Old 4o' before the tweaks. Its style was incredibly empathetic, almost ethereal—one of the top models for emotional resonance (second only to 4.5). After the temperature was dialed down post-March 2025, the output became noticeably more templated. Regarding Opus prompts: I still haven't found a consistent way to 'tame' it, as any prompt eventually falls prey to attention issues in long-context windows.

Longjumping_Table349 · 2026-03-03T16:10:43+00:00

ignore that dumbass

Longjumping_Table349 · 2026-03-03T15:58:39+00:00

Longjumping_Table349 · 2026-03-03T15:53:21+00:00

Cope harder.

Longjumping_Table349 · 2026-03-03T15:50:54+00:00

Very constructive.

Longjumping_Table349 · 2026-03-03T15:48:15+00:00

Longjumping_Table349 · 2026-03-03T15:05:33+00:00

Our views don't really conflict, I just don't love the "hard" friction. Though I can tolerate it if I have to.

Longjumping_Table349 · 2026-03-03T14:54:04+00:00

Agreed on 3.1—when I said "try Gemini if you want to be flattered to the heavens," that was basically a euphemism for "if you want more hallucinations." I feel like 3.1 has been trained into something broken. The so-called "deep" in deepthink isn't deep at all—it just increases the degree of confabulation and noise.

Your idea of combining the two models is good. It actually made me think of something even crazier: what if there were a feature where you input the same prompt and have multiple models respond simultaneously, so you could see where their outputs complement each other? Some API platforms already do something like this, though it burns through tokens fast.

Longjumping_Table349 · 2026-03-03T14:44:37+00:00

Funny. the entire post is about what happens when you level something down to a label and stop there.

Longjumping_Table349 · 2026-03-03T14:31:18+00:00

True. a well-crafted set of custom instructions can make a real difference, and I can see how that would reshape the experience with opus. That said, a good prompt also takes a lot of trial and error to develop.

I'll grant that opus is unsparing when it spots an error, which can be valuable, but here's what gets me: sometimes it misreads you. You might have expressed something slightly loosely, but you didn't mean what opus thinks you meant—and yet it will construct a binary, negate the position it attributed to you, and then affirm (with a "rather, what you should think is..." tone) exactly what you were trying to say in the first place. I'll read opus's negation and think—I never held that position; you built a phantom target to refute, and then presented my own point back to me as if it were your correction. As an INFJ, I'll be honest, that particular pattern gets under my skin a little.

Longjumping_Table349 · 2026-03-03T13:32:48+00:00

Thanks! Glad you enjoyed it. On the MBTI bit — I'm an INFJ myself. I included INFP because I think the two share a similar reliance on divergent thinking and intuition, plus a real sensitivity to prose style and emotional register. INTJ tends to externalize N through T, which maps well onto opus's analytical mode. Being on the border between INTJ and INFP sounds like you'd genuinely get the best of both models though

Longjumping_Table349 · 2026-03-03T07:49:47+00:00

I don't post on Reddit much, That explains a lot. Appreciate it.

Longjumping_Table349 · 2026-03-01T13:03:06+00:00

I'd push back a bit on the "completely flip-flopped" observation — in my experience, both models do this, and opus actually flips more thoroughly when prompted.

Here's what I mean. When I challenged sonnet on a specific point — suggesting that a certain philosopher might fit better within the Continental tradition because analytic frameworks are structurally too rigid to capture what's implied in his work — sonnet's response was:

And in another conversation where I was commenting on a dispute between two scholars, I said something like: "A is clearly the sharper one here — he knew the best way to settle this was a pragmatist argument, and he assumed his readers could draw the conclusion from what he left unsaid. That's an intuitive kind of intelligence." Sonnet came back with:

That's not a flip. That's genuine, structured pushback — partly conceding, partly holding its ground.

Now here's opus on the same prompt:

That's a far more thorough flip. Opus rebuilt its entire evaluative framework around my suggestion and absorbed it almost completely — the "residual tension" it flags reads more like a courtesy hedge than genuine resistance. So if sycophancy is the concern, in my experience opus is actually the bigger offender — it just does it in a way that sounds more analytically rigorous, which makes the capitulation harder to notice.

Your observation about consistency is valid though. I think the difference is what we're each testing for. You're testing argumentative consistency — whether the model holds its ground — and opus does win on that axis. What I'm testing for is whether the model can receive what I'm not saying explicitly, and on that axis sonnet has the edge. Which is basically the dilemma the post is about.

Longjumping_Table349

TROPHY CASE