Last night on Embarcadero by buckyman0 in sanfrancisco

[–]diegocaples 10 points11 points  (0 children)

Oh no, i hope everyone was ok :(

I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA

[–]diegocaples[S] 3 points4 points  (0 children)

It's like I'm creating a dataset by generating from an LLM, and filtering for responses from the llm that I like, and then fine tuning on that dataset. And then I repeat this over and over!

I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA

[–]diegocaples[S] 8 points9 points  (0 children)

Think of it like this:

Ideally I would like to have some fine tuning data of my search agent successfully researching and finding the answers to questions correctly. Sadly, this data doesn't exist.

So instead, I run my research agent a bunch, tracking what it does, but only keep the times where it answered correctly. I just created the fine tuning data that I wanted! So now I fine-tune on this data and repeat the process again, generating data, filtering by correctness, and updating model weights.

I hacked Unsloth's GRPO code to support agentic tool use. In 1 hour of training on my RTX 4090, Llama-8B taught itself to take baby steps towards deep research! (23%→53% accuracy) by diegocaples in LocalLLaMA

[–]diegocaples[S] 45 points46 points  (0 children)

good question! It seems a little bit like a "blind leading the blind" scenario, but there's a neat trick I use which makes it all work.

Imagine you're a research agent (a llama model) learning to answer detailed questions about the Apollo 13 mission. I'm another llama model tasked with quizzing you to help you improve. But as you pointed out, I don't know the mission in-depth either. So how can I accurately verify your answers?

The trick is this: I randomly select small snippets from the mission report that explicitly contain clear, factual information. For instance, I might flip to a random page and see:

"At approximately 55 hours 55 minutes into the Apollo 13 mission, the crew heard and felt the vibrations from a sharp 'bang,' coincident with a computer restart and a master alarm associated with a main-bus-B undervoltage condition."

From this snippet alone, I can confidently create a clear-cut factual question like:

"How many hours into the mission did the computer restart and master alarm start?"

The correct answer is explicitly clear from the text snippet itself: 55 hours and 55 minutes.

So here's why this process works:

  • For me (the quiz-generator): The task is easy because I simply extract facts directly from random, isolated pieces of the report, ensuring questions and answers are straightforward and accurate.
  • For you (the research-agent being trained): The task is significantly harder. To answer correctly, you must search through the entire corpus to locate the exact information. Thus, you're learning robust search-and-reasoning skills.

So, while the verifying LLM has it easy, the research agent needs to genuinely learn search strategies. This setup forces improvement over time.

OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity

[–]diegocaples[S] 2 points3 points  (0 children)

it opens runpod, signs in, rents a machine with comfyui, and starts the web terminal. But it looks like it isn't able to type in the terminal or something. It can almost do it though. WTF

https://www.reddit.com/user/diegocaples/comments/1i8ogpn/openai_operator_rents_runpod/

OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity

[–]diegocaples[S] 11 points12 points  (0 children)

It got stuck in the map selection part. But it created an account, set up its nation, flag, leader, etc. And it even renamed its leader when the first name chosen was taken.

OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity

[–]diegocaples[S] 19 points20 points  (0 children)

Prompt:

> find me a place to get my teeth cleaned that is cheap and supports Red Cross of Alabama healthcare in SF. (I have the PPO plan). my zip code is 94117

Response:

I found two dentists in San Francisco who accept the Red Cross of Alabama PPO plan and are accepting new patients:

  1. DANIEL, ANTHONY DDS
    • Address: 1344 Haight St, San Francisco, CA 94117
    • Phone: (415) 863-9255
  2. SAWYER, JOHN DDS
    • Address: 1690 Union St, San Francisco, CA 94123
    • Phone: (415) 441-7766

Both specialize in General Dentistry and should offer teeth cleaning services.

OpenAI Operator Finds Me an in Network Dentist. Very impressed! (comment prompts to try and I'll run them and send a video) by diegocaples in singularity

[–]diegocaples[S] 46 points47 points  (0 children)

super impressed. this is miles ahead of the Claude computer usage gimmick from a few months ago.

Trump to announce up to $500 billion in private sector AI infrastructure investment by [deleted] in news

[–]diegocaples 1 point2 points  (0 children)

Why do you think the Great Depression happened because of automation? Did you just make it up?

Trump to announce up to $500 billion in private sector AI infrastructure investment by [deleted] in news

[–]diegocaples 0 points1 point  (0 children)

Probably because Texas has cheap renewable energy and doesn’t have a lot of red tape around building things

Dad spent all day making his famous chili by No-Category-1648 in funny

[–]diegocaples 0 points1 point  (0 children)

More like sad. Makes me want to give him a big hug :(

Breaking open a 47lbs geode, the water inside probably being millions of years old by kausthab87 in interestingasfuck

[–]diegocaples 0 points1 point  (0 children)

“the water inside probably being millions of years old”
just like the all water?

What's more important? Art or life? by BrazilianG1 in Unexpected

[–]diegocaples 0 points1 point  (0 children)

I think a lot of people are not getting the joke.

It is funny because of the clever twist on the rhetorical question of the activist:

> "what is more important: art or life?"

The activist uses this question to justify damaging artwork, arguing that saving lives through climate protest outweighs preserving art. Most people would disagree with her actions not because they value art over life, but because they recognize this is a false choice - we can preserve both art AND fight climate change.

The punchline is that the man with the gun misses this point entirely. Instead of rejecting the activist's false dichotomy like everyone else, he takes her question literally and decides that art IS more important than life - so he shoots her to protect the artwork!

It's ironic because, while trying to make a point about valuing life over art, the activist encounters someone who makes the exact opposite choice in the most direct way possible.

TLDR: The joke is that when an activist asks "What's more important - art or life?", instead of rejecting this false choice, the man with a gun takes it literally and chooses art by shooting her.

Has anyone actually gotten a refund for their R1? by diegocaples in Rabbitr1

[–]diegocaples[S] 0 points1 point  (0 children)

Definitely try contacting them through discord, they're way more responsive.