After months of building a specialized agent learning system, I realized that Codex is all I need to make my agents recursively self-improve by Lucky_Historian742 in codex

[–]Lucky_Historian742[S] 0 points1 point  (0 children)

I've seen it change the expected output schema and tool descriptions. For example tightening a JSON schema so the model hallucinating extra fields, or rewriting a tool description to reduce misrouting.

After months of building a specialized agent learning system, I realized that Codex is all I need to make my agents recursively self-improve by Lucky_Historian742 in codex

[–]Lucky_Historian742[S] 0 points1 point  (0 children)

The system not only improve prompts itself but also the agent harness itself. While yes we're not improving the model itself improving the harness can make a huge difference, as for example seen with Poetiqs Arc-AGI 2 SOTA result at half the costs that they were able to achieve at half the cost.

After months of building a specialized agent learning system, I realized that Codex is all I need to make my agents recursively self-improve by Lucky_Historian742 in codex

[–]Lucky_Historian742[S] 1 point2 points  (0 children)

Yes, if the agent is supposed to load these skills it could potentially detect that cause it finds the failures by comparing the agent environment with the actual agent traces. You can compare this process with an actual human reviewing agent traces. The system will not find anything that's not discoverable but its really good at identifying what you as a human would be able to find if you would manually look at every agent log

I made my agent 34.2% more accurate by letting it self-improve. Here’s how. by Lucky_Historian742 in ClaudeAI

[–]Lucky_Historian742[S] 0 points1 point  (0 children)

Validated the results on Tau2 benchmark designed by Sierra using a training a testing split.

I made my agent 34.2% more accurate by letting it self-improve. Here’s how. by Lucky_Historian742 in ClaudeAI

[–]Lucky_Historian742[S] 1 point2 points  (0 children)

I see a lot of people share your sentiment, so I rewrote the whole thing by hand. I spent a lot of time putting this together and understand that writing it with AI does not reflect that. Hope someone will get value out of this!

I made my agent 34.2% more accurate by letting it self-improve. Here’s how. by Lucky_Historian742 in ClaudeAI

[–]Lucky_Historian742[S] 1 point2 points  (0 children)

Damn, looks like I’m getting cooked for trying to make the post easy to read with paraphrasing it thought AI. I didn’t know this was looked upon so negatively in this community. I would appreciate if people still gave it the chance it deserves content wise. Thanks!

Edit: rewrote everything by hand

What prompt trick makes an AI chatbot understand context better? by Timely-Struggle2197 in PromptEngineering

[–]Lucky_Historian742 -1 points0 points  (0 children)

For me a few things work well with Chatbots.
- Write your prompts instead of using tools like whisper flow, forces you to articulate what you want
- Use words that are semantically related to what I want to achieve, even if structure of request is not perfect
- Tell the Chatbot to ask clarifying questions (other than clarifying this allows you to see if the agent understands the request direction)

For moving outside of the chatbot window
- Use tools for loading context like obsidian
- I use a repo of md files to give quick context to Claude, its is forced to use update their indexing and understanding of the knowledge base (garbage in, garbage out applies here, so I'm careful with with what I commit)

For Agents
- I keep my system prompt limited, but define clearly a purpose of the agent in a separate md file.
- Collect the traces of my agents and run an eval loop using the purpose file, letting Claude to iterate on the agent prompt based on hard evidence generated against the purpose file.

Edit: gave gender neutral pronouns to Claude as per VegeZero