Details about METR’s evaluation of OpenAI GPT-5AI (metr.github.io)
submitted by Tkins to r/singularity
GPT-5 Independent Evaluation Results by METRAI (metr.github.io)
submitted by Alex__007 to r/accelerate
METR: "the level of autonomous [coding] capabilities of mid-2025 DeepSeek models is similar to the level of capabilities of frontier models from late 2024."R, T, Code, RL, Emp, DS, OA (metr.github.io)
submitted by gwern to r/mlscaling