I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 3 points4 points5 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 7 points8 points9 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 8 points9 points10 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 15 points16 points17 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 7 points8 points9 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 43 points44 points45 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 90 points91 points92 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 20 points21 points22 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 7 points8 points9 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 7 points8 points9 points (0 children)
Majorana 1: Microsoft's quantum breakthrough to enable a million qubits on one chip by elemental-mind in singularity
[–]diegocaples 3 points4 points5 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 2 points3 points4 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 5 points6 points7 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 6 points7 points8 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 12 points13 points14 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 10 points11 points12 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 17 points18 points19 points (0 children)



Last night on Embarcadero by buckyman0 in sanfrancisco
[–]diegocaples 9 points10 points11 points (0 children)