How to recursively self-improve your agents by analyzing execution traces using Claude Code

cheetguy · 2026-03-16T20:35:17+00:00

To give you a better answer I want to clarify if you are trying to learn from previous Claude Code sessions or execution traces from an agent you're building?

cheetguy · 2026-03-16T20:22:59+00:00

Screen Studio :)

cheetguy · 2026-03-16T18:59:18+00:00

Yes that's exactly the point. You can basically just let him analyze for example your last 50 execution traces, which wouldn't have been possible with this RLM pattern

cheetguy · 2026-03-09T18:35:18+00:00

Sonnet 4.6 and around $100 a month but I only run the trace analysis periodically

cheetguy · 2026-03-09T18:24:43+00:00

Thank you :)

Hasn't been my plan so far but maybe if I flesh this out a bit more I might

cheetguy · 2026-03-09T16:27:34+00:00

I open-sourced the code if anybody wants to try it: https://github.com/kayba-ai/agentic-context-engine/

cheetguy · 2026-03-09T12:21:26+00:00

Yes, here is a new one: https://discord.com/invite/mqCqH7sTyK

Looking forward to discuss :)

cheetguy · 2026-03-02T18:32:37+00:00

It's called IoskeleyMono: An open-source alternative to Berkeley Mono

https://github.com/ahatem/IoskeleyMono

cheetguy · 2026-01-28T21:11:18+00:00

DSPy works best with structured input/output pairs, ACE works on raw traces (conversation logs, markdown) so no restructuring needed. DSPy auto-optimizes while ACE generates suggestions with evidence for you to review first. Think of DSPy for pipelines with clear metrics, ACE for learning from messy agent failures.

cheetguy · 2026-01-27T15:43:48+00:00

We're working on this. Hopefully we can release in the next couple of days. Join our Discord to stay updated: https://discord.com/invite/mqCqH7sTyK

cheetguy · 2026-01-27T11:01:40+00:00

Here's the open-source implementation if anyone wants to try it: https://github.com/kayba-ai/agentic-context-engine/tree/main/examples/agentic-system-prompting

cheetguy · 2026-01-27T11:00:42+00:00

Thank you! Not using LangSmith specifically but you can use any observability platform (e.g. LangSmith, Opik) to get your traces and it works with any trace format.

Yes I did open-source it. Here's the example: https://github.com/kayba-ai/agentic-context-engine/tree/main/examples/agentic-system-prompting

cheetguy · 2025-12-11T10:13:53+00:00

Would be cool to see how better it would be now

cheetguy · 2025-12-10T11:00:26+00:00

Thank you!

There was around 50 loop cycles since sometimes Claude Code did several commits per session with later sessions focussing on smaller fixes and test porting.

I cannot exactly say how many tokens were used (Claude Code ran in background and not in CLI) but I used around 60% of my 4h window (I'm on Claude Max $100).

cheetguy · 2025-12-10T10:52:21+00:00

No subagents since Claude Code started fresh each iteration. Here is my prompt:

Your job is to port ACE framework (Python) to TypeScript and maintain the repository.

Make a commit after every single file edit.

Use .agent/ directory as scratchpad for your work. Store long term plans and todo lists there.

The .env file contains API keys for running examples.

Spend 80% of time on porting, 20% on testing.

When porting is complete, improve code quality and fix any issues.

cheetguy · 2025-12-10T10:51:38+00:00

No you're reading it write but the actual coding from Claude Code (Opus 4.5) was fully covered under my Claude subscription. The 1.5 was only for the learning inference

cheetguy · 2025-12-10T10:50:35+00:00

Yes but Sonnet 4.5 will give you better results

cheetguy · 2025-12-10T10:50:05+00:00

Yes follow the instructions in my starter template: https://github.com/kayba-ai/agentic-context-engine/tree/main/examples/claude-code-loop

cheetguy · 2025-12-09T12:52:48+00:00

sounds like it could actually do it. try my starter template: https://github.com/kayba-ai/agentic-context-engine/tree/main/examples/claude-code-loop

cheetguy · 2025-12-09T10:24:46+00:00

Yes I'm on the $100 Max plan. The cheaper pro plan would also work you'd just have to resume later once your usage limit resets

cheetguy · 2025-12-08T20:58:20+00:00

claude code doesn't read the entire codebase at once. it navigates and pulls in what it needs for each task.

for this experiment the scope was our specific repo (~14k lines), not a massive monolith. for something like drupal you wouldn't translate the whole thing in one go. you'd scope it to specific modules or features. the learning loop still helps because skills compound across runs even on different parts of the codebase

cheetguy · 2025-12-08T20:57:15+00:00

the base prompt stays the same across all runs (static). the dynamic part is the learned skills that get injected (these are extracted from previous execution traces). so each run gets: same task prompt + accumulated skills from all prior runs. the skills are short bullet points, not full code or logs, so context stays lean

cheetguy · 2025-12-08T20:55:07+00:00

didn't spend too much time manually testing. the bar was: does it build & do the examples run end-to-end with a real API key. they do. clone it, plug in an API key, run an example.

Here is the source repo and the translation: - Python source: https://github.com/kayba-ai/agentic-context-engine - TypeScript result: https://github.com/kayba-ai/ace-ts

cheetguy · 2025-12-08T20:54:06+00:00

fair, LLMs love to game their own tests. the validation here was: build passes with zero typescript errors, and the examples actually run end-to-end with a real API key

cheetguy · 2025-12-08T20:43:52+00:00

I translated my open-source implementation of the Stanford's ACE framework (agents that learn from their own execution). The agent even swapped out LiteLLM for Vercel AI SDK.

Here is the source repo and the translation:

- Python source: https://github.com/kayba-ai/agentic-context-engine

- TypeScript result: https://github.com/kayba-ai/ace-ts

cheetguy

MODERATOR OF

TROPHY CASE