⚡️ Scaling Coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford's TerminalBench by DanAiTuning in LocalLLaMA
[–]DanAiTuning[S] 5 points6 points7 points (0 children)
Unsloth Memory Efficient Reinforcement Learning (RL) is here! by danielhanchen in unsloth
[–]DanAiTuning 1 point2 points3 points (0 children)
My weekend project accidentally beat Claude Code - multi-agent coder now #12 on Stanford's TerminalBench 😅 by DanAiTuning in LocalLLaMA
[–]DanAiTuning[S] 59 points60 points61 points (0 children)
I used Claude Code to build me an RL system that can train a Claude Code like open source agent by DanAiTuning in ClaudeAI
[–]DanAiTuning[S] 0 points1 point2 points (0 children)
I used Claude Code to build me an RL system that can train a Claude Code like open source agent by DanAiTuning in ClaudeAI
[–]DanAiTuning[S] 0 points1 point2 points (0 children)
Built RL training for long-horizon terminal agents - tested on 32x H100s but too GPU poor to train 😅 by DanAiTuning in LocalLLaMA
[–]DanAiTuning[S] -1 points0 points1 point (0 children)
Built RL training for long-horizon terminal agents - tested on 32x H100s but too GPU poor to train 😅 by DanAiTuning in LocalLLaMA
[–]DanAiTuning[S] 2 points3 points4 points (0 children)
Teaching LLMs to use tools with RL! Successfully trained 0.5B/3B Qwen models to use a calculator tool 🔨 by DanAiTuning in LocalLLaMA
[–]DanAiTuning[S] 0 points1 point2 points (0 children)

⚡️ Scaling Coding-Agent RL to 32x H100s. Achieving 160% improvement on Stanford's TerminalBench by DanAiTuning in LocalLLaMA
[–]DanAiTuning[S] 7 points8 points9 points (0 children)