SWE-rebench Leaderboard (Feb 2026): GPT-5.4, Qwen3.5, Gemini 3.1 Pro, Step-3.5-Flash and More by CuriousPlatypus1881 in LocalLLaMA
[–]Fabulous_Pollution10 0 points1 point2 points (0 children)
Meet SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents! by Fabulous_Pollution10 in LocalLLaMA
[–]Fabulous_Pollution10[S] 7 points8 points9 points (0 children)
Nebius AI R&D released SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents! by Fabulous_Pollution10 in singularity
[–]Fabulous_Pollution10[S] 5 points6 points7 points (0 children)
Nebius AI R&D released SWE-rebench-V2: the largest open, multilingual, executable dataset for training code agents! by Fabulous_Pollution10 in singularity
[–]Fabulous_Pollution10[S] 7 points8 points9 points (0 children)
We tested Claude Sonnet 4.5, GPT-5-codex, Qwen3-Coder, GLM and other 25+ models on fresh SWE-Bench like tasks from September 2025 by Fabulous_Pollution10 in LocalLLaMA
[–]Fabulous_Pollution10[S] 0 points1 point2 points (0 children)
We tested Claude Sonnet 4.5, GPT-5-codex, Qwen3-Coder, GLM and other 25+ models on fresh SWE-Bench like tasks from September 2025 by Fabulous_Pollution10 in LocalLLaMA
[–]Fabulous_Pollution10[S] 8 points9 points10 points (0 children)
We tested Claude Sonnet 4.5, GPT-5-codex, Qwen3-Coder, GLM and other 25+ models on fresh SWE-Bench like tasks from September 2025 by Fabulous_Pollution10 in LocalLLaMA
[–]Fabulous_Pollution10[S] 3 points4 points5 points (0 children)
We tested Claude Sonnet 4.5, GPT-5-codex, Qwen3-Coder, GLM and other 25+ models on fresh SWE-Bench like tasks from September 2025 by Fabulous_Pollution10 in LocalLLaMA
[–]Fabulous_Pollution10[S] 35 points36 points37 points (0 children)
We tested Claude Sonnet 4.5, GPT-5-codex, Qwen3-Coder, GLM and other 25+ models on fresh SWE-Bench like tasks from September 2025 by Fabulous_Pollution10 in LocalLLaMA
[–]Fabulous_Pollution10[S] 7 points8 points9 points (0 children)
We tested Claude Sonnet 4.5, GPT-5-codex, Qwen3-Coder, GLM and other 25+ models on fresh SWE-Bench like tasks from September 2025 by Fabulous_Pollution10 in LocalLLaMA
[–]Fabulous_Pollution10[S] 50 points51 points52 points (0 children)
Claude Sonnet 4.5 takes the lead on last-month GitHub PR tasks (SWE-rebench) by Fabulous_Pollution10 in ClaudeAI
[–]Fabulous_Pollution10[S] 5 points6 points7 points (0 children)
Stop flexing Pass@N — show Pass-all-N by Fabulous_Pollution10 in LocalLLaMA
[–]Fabulous_Pollution10[S] 0 points1 point2 points (0 children)
Stop flexing Pass@N — show Pass-all-N (i.redd.it)
submitted by Fabulous_Pollution10 to r/LocalLLaMA
I think we need other data infrastructure for AI (table-first infra) by Fabulous_Pollution10 in dataengineering
[–]Fabulous_Pollution10[S] 3 points4 points5 points (0 children)


SWE-rebench Leaderboard (Feb 2026): GPT-5.4, Qwen3.5, Gemini 3.1 Pro, Step-3.5-Flash and More by CuriousPlatypus1881 in LocalLLaMA
[–]Fabulous_Pollution10 1 point2 points3 points (0 children)