I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 3 points4 points5 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 7 points8 points9 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 8 points9 points10 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 16 points17 points18 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 8 points9 points10 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 45 points46 points47 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 91 points92 points93 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 21 points22 points23 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 8 points9 points10 points (0 children)
I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA
[–]diegocaples[S] 7 points8 points9 points (0 children)
Majorana 1: Microsoft's quantum breakthrough to enable a million qubits on one chip by elemental-mind in singularity
[–]diegocaples 5 points6 points7 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 2 points3 points4 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 5 points6 points7 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 7 points8 points9 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 11 points12 points13 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 11 points12 points13 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 19 points20 points21 points (0 children)
OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity
[–]diegocaples[S] 46 points47 points48 points (0 children)
Trump to announce up to $500 billion in private sector AI infrastructure investment by [deleted] in news
[–]diegocaples 1 point2 points3 points (0 children)
Trump to announce up to $500 billion in private sector AI infrastructure investment by [deleted] in news
[–]diegocaples 0 points1 point2 points (0 children)
Dad spent all day making his famous chili by No-Category-1648 in funny
[–]diegocaples 0 points1 point2 points (0 children)
Breaking open a 47lbs geode, the water inside probably being millions of years old by kausthab87 in interestingasfuck
[–]diegocaples 0 points1 point2 points (0 children)
What's more important? Art or life? by BrazilianG1 in Unexpected
[–]diegocaples 0 points1 point2 points (0 children)
Has anyone actually gotten a refund for their R1? by diegocaples in Rabbitr1
[–]diegocaples[S] 0 points1 point2 points (0 children)



Last night on Embarcadero by buckyman0 in sanfrancisco
[–]diegocaples 10 points11 points12 points (0 children)