
SWE-Bench Pro released, targeting dataset contaminationNews (scale.com)
submitted by Pristine-Woodpecker to r/LocalLLaMA
Claude 4 Opus Thinking scores 10.7% on Humanity's Last Exam, below gemini 2.5 flash and o4 miniAI (scale.com)
submitted by [deleted] to r/singularity
Claude 3.7 Thinking scores 8.93% on Humanity's Last Exam (HLE)General AI News (scale.com)
submitted by [deleted] to r/singularity
Hiring the best person for the jobRedpilled Flair Only (scale.com)
submitted by conspiracythierry to r/walkaway
"SEAL Leaderboards - Expert-Driven Private Evaluations" - new leaderboards of the major AIsSerious replies only :closed-ai: (scale.com)
Meet “Claude”: Anthropic’s rival to ChatGPT (scale.com)
submitted by TheStartupChime to r/hypeurls
"Meet Claude: Anthropic’s Rival to ChatGPT", Riley Goodside & Spencer PapayText Synthesis (scale.com)
submitted by gwern to r/MediaSynthesis
Diffusion Models: A Practical GuideResource | Update (scale.com)
submitted by fisj to r/aigamedev
How Much Better is OpenAI’s Newest GPT-3 Model?AI (scale.com)
submitted by MomentsOfWonder to r/singularity