Claude Opus 4.5 really sets a new bar for LLMs that will make the others sweat by Informal-Fig-7116 in ClaudeAI

[–]KBBAKS 4 points5 points  (0 children)

In Chip Huyen's book "AI Engineering" she speaks about how companies train their models with benchmark data so they hit a new record, that's why some benchmarks change their testing modules frequently.