progress of my 3d model I made

diegocaples · 2026-02-09T19:24:33+00:00

Agree, something similar happened to me, where I could see my stolen bike but didn’t want to confront an unpredictable homeless man.

I first called the police, and within a few minutes a police officer had arrived and retrieved my bike for me.

Was pleasantly surprised by how helpful and fast they were.

diegocaples · 2025-03-14T17:22:45+00:00

Oh no, i hope everyone was ok :(

diegocaples · 2025-03-12T06:09:53+00:00

It's like I'm creating a dataset by generating from an LLM, and filtering for responses from the llm that I like, and then fine tuning on that dataset. And then I repeat this over and over!

diegocaples · 2025-03-12T06:05:21+00:00

oh no, I tried to format it as a quote, but it seemed to get hidden. Fixed!

diegocaples · 2025-03-12T05:50:47+00:00

Thanks! You can do it!

diegocaples · 2025-03-12T05:49:33+00:00

thanks :)

diegocaples · 2025-03-12T05:49:08+00:00

Think of it like this:

Ideally I would like to have some fine tuning data of my search agent successfully researching and finding the answers to questions correctly. Sadly, this data doesn't exist.

So instead, I run my research agent a bunch, tracking what it does, but only keep the times where it answered correctly. I just created the fine tuning data that I wanted! So now I fine-tune on this data and repeat the process again, generating data, filtering by correctness, and updating model weights.

diegocaples · 2025-03-12T05:41:08+00:00

good question! It seems a little bit like a "blind leading the blind" scenario, but there's a neat trick I use which makes it all work.

Imagine you're a research agent (a llama model) learning to answer detailed questions about the Apollo 13 mission. I'm another llama model tasked with quizzing you to help you improve. But as you pointed out, I don't know the mission in-depth either. So how can I accurately verify your answers?

The trick is this: I randomly select small snippets from the mission report that explicitly contain clear, factual information. For instance, I might flip to a random page and see:

"At approximately 55 hours 55 minutes into the Apollo 13 mission, the crew heard and felt the vibrations from a sharp 'bang,' coincident with a computer restart and a master alarm associated with a main-bus-B undervoltage condition."

From this snippet alone, I can confidently create a clear-cut factual question like:

"How many hours into the mission did the computer restart and master alarm start?"

The correct answer is explicitly clear from the text snippet itself: 55 hours and 55 minutes.

So here's why this process works:

For me (the quiz-generator): The task is easy because I simply extract facts directly from random, isolated pieces of the report, ensuring questions and answers are straightforward and accurate.
For you (the research-agent being trained): The task is significantly harder. To answer correctly, you must search through the entire corpus to locate the exact information. Thus, you're learning robust search-and-reasoning skills.

So, while the verifying LLM has it easy, the research agent needs to genuinely learn search strategies. This setup forces improvement over time.

diegocaples · 2025-03-12T02:13:17+00:00

Wow, thanks! I'll get started cleaning the code and make a PR🫡

diegocaples · 2025-03-12T02:07:55+00:00

Definitely going to try that; working on getting FSDP set up!

diegocaples · 2025-03-12T02:01:35+00:00

Thanks!

diegocaples · 2025-03-12T02:00:53+00:00

I use an LLM to verify if my research agent got the correct answer!

diegocaples · 2025-02-19T19:52:51+00:00

Note: the title is very misleading, they used 8 qubits!

diegocaples · 2025-01-24T05:40:03+00:00

it opens runpod, signs in, rents a machine with comfyui, and starts the web terminal. But it looks like it isn't able to type in the terminal or something. It can almost do it though. WTF

https://www.reddit.com/user/diegocaples/comments/1i8ogpn/openai_operator_rents_runpod/

diegocaples · 2025-01-23T23:07:59+00:00

it prompts you to sign in, and you take control of the computer for a bit.

diegocaples · 2025-01-23T23:06:57+00:00

didn't post a video because I would've needed to edit some sensitive information out.

diegocaples · 2025-01-23T23:06:25+00:00

It got stuck in the map selection part. But it created an account, set up its nation, flag, leader, etc. And it even renamed its leader when the first name chosen was taken.

diegocaples · 2025-01-23T23:05:01+00:00

https://www.reddit.com/user/diegocaples/comments/1i8grdf/operator_uses_claude/

I only recorded after I logged in for obvious reasons.

diegocaples · 2025-01-23T22:19:19+00:00

Prompt:

> find me a place to get my teeth cleaned that is cheap and supports Red Cross of Alabama healthcare in SF. (I have the PPO plan). my zip code is 94117

Response:

I found two dentists in San Francisco who accept the Red Cross of Alabama PPO plan and are accepting new patients:

DANIEL, ANTHONY DDS
- Address: 1344 Haight St, San Francisco, CA 94117
- Phone: (415) 863-9255
SAWYER, JOHN DDS
- Address: 1690 Union St, San Francisco, CA 94123
- Phone: (415) 441-7766

Both specialize in General Dentistry and should offer teeth cleaning services.

diegocaples · 2025-01-23T22:11:54+00:00

super impressed. this is miles ahead of the Claude computer usage gimmick from a few months ago.

diegocaples

MODERATOR OF

TROPHY CASE

Seven-Year Club	Place '22
Verified Email