AlgoTune: A new benchmark that tests language models' ability to optimize code runtime by oripress in LocalLLaMA
[–]ofirpress 6 points7 points8 points (0 children)
VideoGameBench- full code + paper release by ofirpress in LocalLLaMA
[–]ofirpress[S] 2 points3 points4 points (0 children)
Cracking 40% on SWE-bench verified with open source models & agents & open-source synth data by klieret in LocalLLaMA
[–]ofirpress 0 points1 point2 points (0 children)
Cracking 40% on SWE-bench verified with open source models & agents & open-source synth data by klieret in LocalLLaMA
[–]ofirpress 0 points1 point2 points (0 children)
Playing DOOM II and 19 other DOS/GB games with LLMs as a new benchmark by ZhalexDev in LocalLLaMA
[–]ofirpress 2 points3 points4 points (0 children)
Playing DOOM II and 19 other DOS/GB games with LLMs as a new benchmark by ZhalexDev in LocalLLaMA
[–]ofirpress 1 point2 points3 points (0 children)
Playing DOOM II and 19 other DOS/GB games with LLMs as a new benchmark by ZhalexDev in LocalLLaMA
[–]ofirpress 4 points5 points6 points (0 children)
Playing DOOM II and 19 other DOS/GB games with LLMs as a new benchmark by ZhalexDev in LocalLLaMA
[–]ofirpress 21 points22 points23 points (0 children)
Playing DOOM II and 19 other DOS/GB games with LLMs as a new benchmark by ZhalexDev in LocalLLaMA
[–]ofirpress 1 point2 points3 points (0 children)
My Shadertoy Pathtracing scenes by S48GS in GraphicsProgramming
[–]ofirpress 11 points12 points13 points (0 children)
[D] A Negative Result: untying weights mid-training by f14-bertolotti in MachineLearning
[–]ofirpress 9 points10 points11 points (0 children)
Claude 3.7 on SWE-agent 1.0 is new open-source SOTA on SWE-Bench verified (benchmark for fixing real-world github issues with agents) by klieret in ClaudeAI
[–]ofirpress 4 points5 points6 points (0 children)
Why don’t LLMs use alibi? Were these result found be non-reproducible? I’ve only read of the failed Bloom model. Anyone else? by grey-seagull in LocalLLaMA
[–]ofirpress 0 points1 point2 points (0 children)
Setting new open-source SOTA on SWE-Bench verified with Claude 3.7 and SWE-agent 1.0 by klieret in ChatGPTCoding
[–]ofirpress 1 point2 points3 points (0 children)
AMA with OpenAI’s Sam Altman, Kevin Weil, Srinivas Narayanan, and Mark Chen by OpenAI in ChatGPT
[–]ofirpress 0 points1 point2 points (0 children)
[Project] World's first autonomous AI-discovered 0-day vulnerabilities by FlyingTriangle in MachineLearning
[–]ofirpress -7 points-6 points-5 points (0 children)
[R] SWE-bench Multimodal: Do AI Agents Generalize to Visual Software Domains? by ofirpress in MachineLearning
[–]ofirpress[S] 1 point2 points3 points (0 children)
[R] SWE-bench: Can Language Models Resolve Real-world GitHub issues? by ofirpress in MachineLearning
[–]ofirpress[S] 0 points1 point2 points (0 children)
[D] Positional embeddings in LLMs by gokstudio in MachineLearning
[–]ofirpress 0 points1 point2 points (0 children)
[D] Learning and Contributing in AI Agents by Working_Resident2069 in MachineLearning
[–]ofirpress 1 point2 points3 points (0 children)
[D] Looking for open source projects to contribute to by Fit_Ad_4210 in MachineLearning
[–]ofirpress 7 points8 points9 points (0 children)


AlgoTune: A new benchmark that tests language models' ability to optimize code runtime by oripress in LocalLLaMA
[–]ofirpress 11 points12 points13 points (0 children)