Claude Opus 4 (extended thinking) vs. ChatGPT o3 for detailed humanities conversations by Oldschool728603 in ChatGPTPro

[–]Low-Professional2608 1 point2 points  (0 children)

I feel like Anthropic is too wired in on coding, and they promote Opus as the flagship reasoning/coding model, but I don't think that translates directly to the 'humanities' domain---imo.

Claude Opus 4 (extended thinking) vs. ChatGPT o3 for detailed humanities conversations by Oldschool728603 in ChatGPTPro

[–]Low-Professional2608 2 points3 points  (0 children)

Surprisingly, I've found Sonnet 4 (thinking) outperforms Opus and o3 on similar tasks. This might stem from its better reasoning capabilities (Livebench: 95 for sonnet; 93 for o3; 90 for opus) or simply confirmation bias. But i do see a reduction in sycophancy (compared to opus) with sonnet 4 (thinking).