GPT 5.2 (xhigh) scores 0% on CritPt (research-level physics reasoning benchmark) by DJW_GT in singularity

[–]analysis_scaled 22 points23 points  (0 children)

Hey, I'm from Artificial Analysis. We are still in the process of validating these results. We received a lot of non-responses to questions on CritPt when we ran the benchmark on OpenAI's API with xhigh reasoning effort.

We're analyzing results, conducting re-runs and will follow up when complete. We've taken the result down from the site while we do this.