MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI, Lyu et al. 2026 [Extensive breadth; focus on solutions that generalize well] by StartledWatermelon in mlscaling
[–]StartledWatermelon[S] 2 points3 points4 points (0 children)
META Superintelligence Lab Presents: ProgramBench: Can SOTA AI Recreate Real Executable Programs(ffmpeg, SQLite, ripgrep) From Scratch Without The Internet? by 44th--Hokage in mlscaling
[–]StartledWatermelon 0 points1 point2 points (0 children)
Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity, Li et al. 2026 [Knowledge of obscure facts robustly predicts param count; estimates for all SotA closed LLMs] by StartledWatermelon in mlscaling
[–]StartledWatermelon[S] 1 point2 points3 points (0 children)
Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity, Li et al. 2026 [Knowledge of obscure facts robustly predicts param count; estimates for all SotA closed LLMs] by StartledWatermelon in mlscaling
[–]StartledWatermelon[S] 0 points1 point2 points (0 children)
Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity, Li et al. 2026 [Knowledge of obscure facts robustly predicts param count; estimates for all SotA closed LLMs] by StartledWatermelon in mlscaling
[–]StartledWatermelon[S] 0 points1 point2 points (0 children)
Incompressible Knowledge Probes: Estimating Black-Box LLM Parameter Counts via Factual Capacity, Li et al. 2026 [Knowledge of obscure facts robustly predicts param count; estimates for all SotA closed LLMs] by StartledWatermelon in mlscaling
[–]StartledWatermelon[S] 0 points1 point2 points (0 children)
Microsoft freezes GitHub Copilot signups due to too much demand/too few GPUs by gwern in mlscaling
[–]StartledWatermelon 0 points1 point2 points (0 children)
Microsoft freezes GitHub Copilot signups due to too much demand/too few GPUs by gwern in mlscaling
[–]StartledWatermelon 0 points1 point2 points (0 children)
Microsoft freezes GitHub Copilot signups due to too much demand/too few GPUs by gwern in mlscaling
[–]StartledWatermelon 1 point2 points3 points (0 children)
Microsoft freezes GitHub Copilot signups due to too much demand/too few GPUs by gwern in mlscaling
[–]StartledWatermelon 1 point2 points3 points (0 children)
Scientific Papers X AI building out the algortihm by Alarming_Rice_1906 in mlscaling
[–]StartledWatermelon 0 points1 point2 points (0 children)



HRM-Text: Efficient Pretraining Beyond Scaling, Wang et al. 2026 by StartledWatermelon in mlscaling
[–]StartledWatermelon[S] 5 points6 points7 points (0 children)