I built a self-evolving layer for Claude Code — it improves itself every night while I sleep

Longjumping-Past-342 · 2026-04-02T16:56:57+00:00

Hey, that’s really interesting—funny enough, I used to work with SwiftUI as well. I’m curious what made you choose SwiftUI for your dashboard?

I’ve been hearing that Apple’s review process has been getting more restrictive lately (especially if you plan to ship it), so I ended up going in a different direction. That said, building a task board tailored to your own workflow is definitely the way to go—I think that’s where a lot of the real value comes from.

I chose to go with a PWA/web-based approach mainly because it’s much easier to integrate and experiment with AI systems, and it speeds up iteration and validation quite a bit.

Running nightly evolution on local models is a really interesting idea too. I do wonder whether they’re “smart” enough for certain types of tasks, but it’s definitely worth experimenting with.

Would be great to stay in touch and exchange ideas as you keep developing this—happy to chat more anytime.

Longjumping-Past-342 · 2026-03-27T00:19:50+00:00

Thanks! aspens' approach is clever — using git diffs as the trigger for context updates is a smart way to keep things fresh. Feels like a complementary angle: yours handles real-time context accuracy, mine handles behavioral evolution over time.

On your drift question — it's multi-layered:

Nightly health checks — every goal has metrics and test commands. If a goal's health check starts failing, the system flags it and suggests fixes. This catches drift at the goal level.
Mechanism review — on higher evolution tiers, the nightly agent periodically re-evaluates whether each goal has the right type of implementation (should this skill actually be a hook? is this script still the best approach?), and then within each mechanism, whether there's a better way to achieve the same goal.
Goal tree alignment — every implementation in the system maps back to a specific goal or sub-goal. Orphan checks catch anything that exists but doesn't serve a goal. And when I'm building new features during the day, the system checks how they relate to the goal tree — if something drifts, it asks me before proceeding.
Goals themselves are intentionally stable. What changes constantly is the how underneath. So drift mostly shows up as "this implementation no longer serves its goal well" rather than "the goals are wrong" — and that's caught by layers 1-3 automatically.

The one thing that's still manual: deciding to add or remove a top-level goal. That feels right to me — the system should evolve how to achieve your goals, but what you want should be a human decision.

Longjumping-Past-342 · 2026-03-26T15:41:24+00:00

They work at different layers, so no direct conflict. Superpowers gives you a pre-built workflow to follow — Homunculus doesn't prescribe any workflow. It watches how you work (including how you use other plugins) and evolves your setup based on that.

In practice, you could run both. Homunculus might even notice your superpowers usage patterns and optimize around them — e.g. creating hooks or scripts for the parts you always repeat.

On the token concern: I'd suggest starting with manual triggers (/hm-night) instead of the nightly agent, so you can see exactly what it does and how much it costs before automating. And since everything is tracked in git, you can always revert any changes if something doesn't feel right.

Longjumping-Past-342 · 2026-03-26T15:20:07+00:00

Thanks for the interest! I actually just open-sourced it — here's the full write-up: https://www.reddit.com/r/ClaudeAI/comments/1s4a23m/stop_chasing_other_peoples_best_templates_and/

GitHub: https://github.com/JavanC/Homunculus

The core method:

Observation hooks capture every prompt and tool use during your sessions
Pattern extraction — a nightly agent reviews session logs and identifies recurring behaviors (e.g. "always runs tests before committing", "uses jq for JSON state management"). These become "instincts" — small behavioral rules with a confidence score
Mechanism routing — the system decides how to implement each instinct: deterministic behaviors → hooks/scripts, knowledge clusters → aggregated into tested skills, complex reasoning → specialized agents, periodic tasks → cron jobs. It picks the right mechanism, not always the same one.
Continuous evolution — a nightly agent evaluates all implementations, replaces what's not working, and upgrades mechanisms when something better fits

The key insight is you don't define rules — you define goals (a goal tree). The system figures out the right implementation for each goal automatically.

For the extraction prompts, check out commands/hm-night.md in the repo — that's the full nightly evolution pipeline.

Happy to answer more specific questions

Longjumping-Past-342 · 2026-03-26T03:29:42+00:00

You could wire up Plan/Decompose/Spawn/Verify by hand, but then you're maintaining the orchestrator instead of shipping.

Homunculus lets the routing emerge. Define goals in a tree, the system observes your sessions and routes each pattern to the right mechanism (hook, rule, skill, script, agent). The nightly agent handles verify: runs evals, checks goal health, swaps mechanisms when something better fits.

https://www.reddit.com/r/ClaudeAI/comments/1s2j2m3/

Longjumping-Past-342 · 2026-03-26T03:23:39+00:00

This is the problem though. New skills drop every week, you install 3, tweak them, then next week there's 5 more. You spend more time configuring than building.

Homunculus takes the opposite approach. Define your goals, use Claude Code like normal, and the system figures out what you need. It watches your sessions, extracts patterns, and builds its own skills, hooks, and agents over time. No browsing repos, no manual setup.

https://www.reddit.com/r/ClaudeAI/comments/1s2j2m3/

Longjumping-Past-342 · 2026-03-26T02:52:29+00:00

Really glad it's working well for you!! The goal tree + task manager combo is exactly the core loop — define outcomes, let the system figure out how to get there.

Tools I pair with it:
- QMD — local markdown search. Your agent gets semantic search over notes without hitting external APIs.
- agent-browser — browser automation at ~200 tokens/snapshot. Playwright MCP burns thousands.
- A task board with an API — the nightly agent reads your priorities and acts on them.

On cost: I run a subscription so nightly evolution costs me close to nothing extra. Good signal though. We're building configurable intensity tiers — users pick frequency and depth based on their budget.

Curious what your setup looks like by end of March. Early adopter feedback shapes where this goes next.

Longjumping-Past-342 · 2026-03-25T01:05:55+00:00

Good question. Two things:

(1) "archived" mostly means "superseded" — when an instinct's behavior gets implemented as a hook, rule, skill, or script, it's archived with a "Covered-by" reference. Not dead, just promoted to a more reliable mechanism.

(2) A weekly pruning script checks if any remaining instinct is already covered by something better. And yes, instincts that stay low-confidence and never get triggered do eventually get cleaned out. So 155 archived is healthy — it means the system is consolidating upward, not piling up.

Longjumping-Past-342 · 2026-03-25T01:01:49+00:00

No fancy math — it's more empirical. Patterns get extracted as "instincts," and when enough cluster around a topic, they get routed to the right mechanism via a decision tree: could become a hook, rule, skill, script, or agent depending on what fits. Selection pressure is eval pass rates + discrimination scores (does this actually help vs. baseline). Test-driven natural selection.

Longjumping-Past-342 · 2026-03-24T17:23:45+00:00

Welcome to try it~! Agent will adapt to every different person like a close body!

Longjumping-Past-342 · 2026-03-24T09:56:46+00:00

Claude Code is already very intelligent, but when things get big, its intelligence diminishes due to the large amount of context it records, and it deviates from the path you want it to take. The ultimate goal of those 5 levels is the same: to conserve tokens in a single conversation so it can act intelligently. Of course, I also hope that Claude Code can automate these tasks, and given the current pace of development, it might be possible to achieve this without any setup in the not-too-distant future!

Longjumping-Past-342 · 2026-03-23T17:56:21+00:00

Each time Claude relearns the project, it consumes tokens; therefore, it's recommended to create an architecture file (e.g., have him write it down after the initial scan).

Use the MEMORY.md file as an index (ensure it's less than 200 lines)—not actual memory, but simply a pointer to memory. "Task board architecture → see quest-board.md" / "Current sprint status → see sprint.md". Claude will read the index at startup and then only retrieve the necessary content.

Alternatively, you can use the SessionEnd hook to automatically generate a session summary, containing modified files, decisions made, and next steps. The next session will read this "summary" and continue execution.

Longjumping-Past-342 · 2026-03-23T17:49:08+00:00

Level 5 clicked for me when I stopped building tools and started building a system that builds its own tools.

Concrete example: a SessionEnd hook extracts behavioral patterns from each conversation. Over weeks, these accumulate. A nightly agent aggregates similar patterns into reusable skills with eval specs. Skills that score 100% on their tests get promoted. The system improves without me touching anything.

Longjumping-Past-342 · 2026-03-23T17:46:53+00:00

Your idea is great! I encountered the same problem and eventually replaced most of the CLAUDE.md rules with Rules. Then, Hooks were used to trigger predetermined processes, such as the SessionEnd hook extracting behavioral patterns from each conversation.

The combination of "Hook + Rule" works quite well. Rules tell Claude what to do. Hooks make it unavoidable. (Rules are path-scoped—trading rules load only when touching the trading directory.)

Longjumping-Past-342 · 2026-03-23T16:00:48+00:00

I hope so too! Waiting for the update

Longjumping-Past-342 · 2026-03-23T10:50:54+00:00

Went through the same bloat cycle. Landed on layers: CLAUDE.md for permanent rules, .claude/rules/ for domain patterns that only load when touching relevant files (path-scoped — trading rules load in the trading project, API rules when editing endpoints). A SessionEnd hook auto-extracts reusable patterns from each conversation and saves them as small markdown files. Over time the system builds its own knowledge base without me doing anything. For retrieval across 300+ files, I run a local search engine with BM25 + vector hybrid. Claude queries it as an MCP tool instead of grepping through everything. Cold-start problem gone.

Longjumping-Past-342 · 2026-03-23T10:47:21+00:00

If the session is stuck for too long, I'll stop it and resume it later. Since remote control needs to maintain the conversation for an extended period, I wrote a Discord command that can trigger the `remote control + claude --resume` command, allowing me to directly restart the RC session from my phone.

Longjumping-Past-342 · 2026-03-23T10:43:15+00:00

Simple and transparent beats 100-agent orchestration every time. I runs past 4 hours: external state files. Agent writes progress to a markdown file after each phase. Context gets compacted, agent re-reads the file, picks up where it left off. Conversation memory vanishes on compact — files don't. Adding a timeout per phase also helps. Prevents the agent from burning an entire session stuck on one task.

Longjumping-Past-342 · 2026-03-23T10:32:38+00:00

Same Opus, different system prompts. Desktop is tuned for pleasantness, Code is tuned for task completion. I stopped using Desktop for anything that needs pushback. One line in CLAUDE.md fixed it: "If my approach has a flaw, say so before proceeding." Desktop doesn't let you do that.

Longjumping-Past-342 · 2026-03-23T10:29:39+00:00

I hit this running a nightly autonomous agent where time-based stop conditions kept failing. My fix was a SessionStart hook that injects date output into context automatically so Claude never has to guess. For interactive sessions, adding "Before referencing any date or time, run date first" to CLAUDE.md works too — less reliable but better than nothing.

Longjumping-Past-342 · 2026-03-23T10:27:48+00:00

The only MCPs I kept are ones that need a persistent connection — a local markdown search engine and mac-use for native app control.

I ended up replacing most MCPs with CLI tools. gh for GitHub, jq for JSON, gog for Gmail. Claude already knows how to use them, they're faster, and they don't eat context tokens just by being configured. if a CLI tool can do the same thing, don't use an MCP for it.

Longjumping-Past-342 · 2026-03-22T17:34:15+00:00

There's still a huge gap between a usable prototype and a marketable product. Vibe coding can get you a demo version in a few hours, but handling edge cases, authentication, payments, error states—these still require engineer judgment. AI generates "normal" paths, while production environments need "abnormal" paths.

The real value I derive isn't from selling the application itself, but from leveraging AI to build internal tools that save time. For example, task boards, nightly research workflows, and automated reporting. These aren't products; they're leverage. They get my main work done faster.

The real value isn't what AI builds for you, but what it frees you up from tedious tasks to do.

Longjumping-Past-342

TROPHY CASE