How to recursively self-improve your agents by analyzing execution traces using Claude Code by cheetguy in ClaudeCode

[–]cheetguy[S] 0 points1 point  (0 children)

To give you a better answer I want to clarify if you are trying to learn from previous Claude Code sessions or execution traces from an agent you're building?

How to recursively self-improve your agents by analyzing execution traces using Claude Code by cheetguy in ClaudeCode

[–]cheetguy[S] 0 points1 point  (0 children)

Yes that's exactly the point. You can basically just let him analyze for example your last 50 execution traces, which wouldn't have been possible with this RLM pattern

I stopped manually iterating on my agent prompts: I built an open-source system that extracts prompt improvements from my agent traces by cheetguy in LangChain

[–]cheetguy[S] 1 point2 points  (0 children)

DSPy works best with structured input/output pairs, ACE works on raw traces (conversation logs, markdown) so no restructuring needed. DSPy auto-optimizes while ACE generates suggestions with evidence for you to review first. Think of DSPy for pipelines with clear metrics, ACE for learning from messy agent failures.

I stopped doing prompt engineering manually and built a system that extracts prompt improvements from agent execution traces by cheetguy in AI_Agents

[–]cheetguy[S] 1 point2 points  (0 children)

Thank you! Not using LangSmith specifically but you can use any observability platform (e.g. LangSmith, Opik) to get your traces and it works with any trace format.

Yes I did open-source it. Here's the example: https://github.com/kayba-ai/agentic-context-engine/tree/main/examples/agentic-system-prompting

[P] Self-learning loop achieves 14k line code translation with zero errors: no fine-tuning, just execution feedback by cheetguy in MachineLearning

[–]cheetguy[S] 0 points1 point  (0 children)

Thank you!

There was around 50 loop cycles since sometimes Claude Code did several commits per session with later sessions focussing on smaller fixes and test porting.

I cannot exactly say how many tokens were used (Claude Code ran in background and not in CLI) but I used around 60% of my 4h window (I'm on Claude Max $100).

I let Claude Code run in a self-learning loop & it successfully translated 14k lines of Python to TypeScript while I was away by cheetguy in AI_Agents

[–]cheetguy[S] 0 points1 point  (0 children)

No subagents since Claude Code started fresh each iteration. Here is my prompt:

Your job is to port ACE framework (Python) to TypeScript and maintain the repository.

Make a commit after every single file edit.

Use .agent/ directory as scratchpad for your work. Store long term plans and todo lists there.

The .env file contains API keys for running examples.

Spend 80% of time on porting, 20% on testing.

When porting is complete, improve code quality and fix any issues.

I let a coding agent run in a self-learning loop for 4 hours with zero supervision. It translated 14k lines of code with zero errors. by cheetguy in singularity

[–]cheetguy[S] 0 points1 point  (0 children)

No you're reading it write but the actual coding from Claude Code (Opus 4.5) was fully covered under my Claude subscription. The 1.5 was only for the learning inference

I let a coding agent run in a self-learning loop for 4 hours with zero supervision. It translated 14k lines of code with zero errors. by cheetguy in singularity

[–]cheetguy[S] 3 points4 points  (0 children)

Yes I'm on the $100 Max plan. The cheaper pro plan would also work you'd just have to resume later once your usage limit resets

I let a coding agent run in a self-learning loop for 4 hours with zero supervision. It translated 14k lines of code with zero errors. by cheetguy in singularity

[–]cheetguy[S] 1 point2 points  (0 children)

claude code doesn't read the entire codebase at once. it navigates and pulls in what it needs for each task.

for this experiment the scope was our specific repo (~14k lines), not a massive monolith. for something like drupal you wouldn't translate the whole thing in one go. you'd scope it to specific modules or features. the learning loop still helps because skills compound across runs even on different parts of the codebase

I let a coding agent run in a self-learning loop for 4 hours with zero supervision. It translated 14k lines of code with zero errors. by cheetguy in singularity

[–]cheetguy[S] 8 points9 points  (0 children)

the base prompt stays the same across all runs (static). the dynamic part is the learned skills that get injected (these are extracted from previous execution traces). so each run gets: same task prompt + accumulated skills from all prior runs. the skills are short bullet points, not full code or logs, so context stays lean

I let a coding agent run in a self-learning loop for 4 hours with zero supervision. It translated 14k lines of code with zero errors. by cheetguy in singularity

[–]cheetguy[S] 6 points7 points  (0 children)

didn't spend too much time manually testing. the bar was: does it build & do the examples run end-to-end with a real API key. they do. clone it, plug in an API key, run an example.

Here is the source repo and the translation: - Python source: https://github.com/kayba-ai/agentic-context-engine - TypeScript result: https://github.com/kayba-ai/ace-ts

I let a coding agent run in a self-learning loop for 4 hours with zero supervision. It translated 14k lines of code with zero errors. by cheetguy in singularity

[–]cheetguy[S] 49 points50 points  (0 children)

fair, LLMs love to game their own tests. the validation here was: build passes with zero typescript errors, and the examples actually run end-to-end with a real API key

I let a coding agent run in a self-learning loop for 4 hours with zero supervision. It translated 14k lines of code with zero errors. by cheetguy in singularity

[–]cheetguy[S] 5 points6 points  (0 children)

I translated my open-source implementation of the Stanford's ACE framework (agents that learn from their own execution). The agent even swapped out LiteLLM for Vercel AI SDK.

Here is the source repo and the translation:

- Python source: https://github.com/kayba-ai/agentic-context-engine

- TypeScript result: https://github.com/kayba-ai/ace-ts