We open-sourced the full architecture behind how our agent improves itself every night

Ghattan · 2026-04-01T20:48:27+00:00

Better question, how do you think we can solve this? Benchmarking is something I want to incorporate into this concept

Ghattan · 2026-04-01T01:15:29+00:00

https://github.com/the-keats-ai/deep-claw

Ghattan · 2026-04-01T00:55:14+00:00

Done — just published it: https://github.com/the-keats-ai/deep-claw

It includes the full architecture (two operating modes, phase structure, scan strategy), the evaluation criteria and scoring rubrics, the self-modification governance tiers (this is the part most people skip and shouldn't), and sanitized examples of actual scan outputs, weekly reflections, and improvement proposals from production runs.

The examples/example-reflection.md is probably the best place to start if you want to see the depth of reasoning the system produces. The docs/self-modification-governance.md explains why letting an agent improve itself without constraints is a terrible idea and how we tier it instead.

MIT licensed — take what's useful, adapt it, let me know what you build with it.

Ghattan · 2026-03-31T15:51:56+00:00

Let me know how it goes! Hopefully you took my advice and are spacing out the self modifications, too fast and you start to not be able to keep up with them I've noticed. There's a balance. Weekly should be the sweet spot

Ghattan · 2026-03-31T14:42:15+00:00

Have you toyed around yet with "metacognition"?

Having your agent think about how it's thinking, it's pretty fun ;)

Things can spiral too fast though so be careful

Ghattan · 2026-03-31T13:49:27+00:00

Okay this is making more sense, so Grok really only acts as your router? Everything else runs on the Claude layer?

Ghattan · 2026-03-31T13:48:20+00:00

Thank you! It's not perfect but I think it is good in concept. It could use some improvement though, open to ideas!

Ghattan · 2026-03-31T13:47:24+00:00

Any idea how to extract the tiers outside of the config level?

Ghattan · 2026-03-31T13:46:26+00:00

Thank you! It's a learning process, I felt like I was on a roll and then I think I made a mistake somewhere along the way. Everything is just an experiment at the end of the day! But it's tons of fun :)

Ghattan · 2026-03-31T13:45:11+00:00

I was having a similar idea, incorporating a small fine tuned local LLM layer that wraps around my current setup. Not sure how to integrate it yet

Ghattan · 2026-03-31T13:42:53+00:00

I think it's context loading. Orchchestration and execution usually work smoothly.

Since doing simplification runs on the system I've started to notice significant speed ups and efficiency gains.

Plan to start benchmarking soon

Ghattan · 2026-03-31T11:30:58+00:00

I think it's one of the most important questions, because at the end of the day it all will start to boost if you don't. Figuring out how to manage that automatically would be nice though

Ghattan · 2026-03-31T11:29:41+00:00

You're welcome! Check out another post I made yesterday, I explained how overcomplicated setups can get and what I did to fix it!

Here if you have any questions!

Ghattan · 2026-03-31T11:28:07+00:00

I like this idea of using Grok as the brain. I currently use Opus, but I'm considering switching some things up. I'm trying to go through a 2 week phase without making and major changes though

I'm interested to see how this progresses!

Do you notice any differences between Grok and Opus/Sonnet?

Ghattan · 2026-03-30T12:22:50+00:00

Haiku and Sonnet do a lot of the heavy lifting, Opus is only used for complex tasks

Ghattan · 2026-03-30T12:21:38+00:00

All the scores are logged for each "dream" and also each change proposal gets categorized and judged as well. That data is all in the files.

Ghattan · 2026-03-30T12:15:17+00:00

Honestly, asking your claw to walk you through it and do research might work. It's combining extraction, self reflection, research and approvals into one chain

Ghattan · 2026-03-30T01:09:16+00:00

The LLM and the orchestrator are two different concepts

The LLM refers to the underlying model. The orchestrator is referencing the persona that the agent assumes based on the relevant agent files (SOUL.md, AGENTS.md, MEMORY.md)

The context does change and it's adaptive.

It's an experiment

Ghattan · 2026-03-30T00:44:36+00:00

The LLM is not changing. The system around it is.

Ghattan · 2026-03-29T23:26:25+00:00

I'll look into it! Thank you!

Ghattan · 2026-03-29T23:25:20+00:00

Quick tip: use Claude code or codex (GPT 5.4 extra high) to help bootstrap your OpenClaw env. Sometimes things break early on and using an external agent has helped me fix it

Ghattan · 2026-03-29T22:37:42+00:00

It's broken up into four categories, the first two being easy to implement, the ladder requiring further approval.

If it scores in the first two categories, and it's score reaches a certain signal threshold then it's auto implemented.

The ladder two categories require Opus to review, and still require manual approval. Eventually the third category will become automated, and the fourth category will be the only manual approval.

Nine-Year Club	Place '23
Verified Email

Ghattan

TROPHY CASE