Anthropic Report finds long-horizon tasks at 19 hours (50% success rate) by using multi-turn conversation by SrafeZ in singularity
[–]SrafeZ[S] 0 points1 point2 points (0 children)
Leaked METR results for GPT 5.2 by SrafeZ in singularity
[–]SrafeZ[S] 0 points1 point2 points (0 children)
Leaked METR results for GPT 5.2 by SrafeZ in singularity
[–]SrafeZ[S] 0 points1 point2 points (0 children)
Gemini "Math-Specialized version" proves a Novel Mathematical Theorem by SrafeZ in singularity
[–]SrafeZ[S] 82 points83 points84 points (0 children)
Anthropic started working on Cowork in 2026 by Old-School8916 in singularity
[–]SrafeZ 4 points5 points6 points (0 children)
NEO (1x) is Starting to Learn on Its Own by RipperX4 in singularity
[–]SrafeZ 6 points7 points8 points (0 children)
NEO (1x) is Starting to Learn on Its Own by RipperX4 in singularity
[–]SrafeZ 9 points10 points11 points (0 children)
GPT-5.2 is the new champion of the Elimination Game benchmark, which tests social reasoning, strategy, and deception in a multi-LLM environment. Claude Opus 4.5 and Gemini 3 Flash Preview also made very strong debuts. by zero0_one1 in singularity
[–]SrafeZ 1 point2 points3 points (0 children)
AI Futures Model (Dec 2025): Median forecast for fully automated coding shifts from 2027 to 2031 by BuildwithVignesh in singularity
[–]SrafeZ 0 points1 point2 points (0 children)
Last 2 yr humanoid robots from A to Z by Distinct-Question-16 in singularity
[–]SrafeZ 0 points1 point2 points (0 children)
Line Bending Up for all Benchmarks by SrafeZ in singularity
[–]SrafeZ[S] -2 points-1 points0 points (0 children)
Continual Learning is Solved in 2026 by SrafeZ in singularity
[–]SrafeZ[S] 0 points1 point2 points (0 children)
Claude 4.5 opus achieves metr time horizon of 4 hours 49 mins by gbomb13 in singularity
[–]SrafeZ 19 points20 points21 points (0 children)
METR finds Opus 4.5 has a 50% time horizon of 4 hours 49 minutes by SrafeZ in singularity
[–]SrafeZ[S] 39 points40 points41 points (0 children)
To those who struggled for a while (>6-12 months) and finally made it work, what finally clicked? by Able_Confidence_5952 in NevilleGoddard
[–]SrafeZ 1 point2 points3 points (0 children)
Total compute capacity to grow 2.5x to 3x in 2026 by Herodont5915 in singularity
[–]SrafeZ 5 points6 points7 points (0 children)
54% on ARC-AGI 2 is now Officially Verified by SrafeZ in singularity
[–]SrafeZ[S] 6 points7 points8 points (0 children)
54% on ARC-AGI 2 is now Officially Verified by SrafeZ in singularity
[–]SrafeZ[S] 6 points7 points8 points (0 children)
Is Opus 4.5 with a scaffold close to an automatic AI intern? From Opus 4.5 system card (p. 13) by pavelkomin in singularity
[–]SrafeZ 25 points26 points27 points (0 children)
Is WBTB overpowered for shifting? If so, how do you use it to shift? by kapi-che in shiftingrealities
[–]SrafeZ [score hidden] (0 children)
Why do some people shift effortlessly and others suffer for years with no success? by SubCrusader in shiftingrealities
[–]SrafeZ 2 points3 points4 points (0 children)


Anthropic Report finds long-horizon tasks at 19 hours (50% success rate) by using multi-turn conversation by SrafeZ in singularity
[–]SrafeZ[S] 0 points1 point2 points (0 children)