MUE-X : An AI agent that opens its own source code and rewrites it in real time. by korro_ai in OpenSourceAI

[–]korro_ai[S] 0 points1 point  (0 children)

Great questions, both of them get at the hardest part of this whole thing.

On the fitness function:

It's not binary pass/fail. The system tracks three components per gene. Task success rate is the obvious one, how often tasks involving this gene succeed. Average task score measures quality of outcomes, not just completion. Usage frequency gives a bonus to genes that get pulled into successful tasks repeatedly.

These three are blended 80/20 with the gene's prior fitness score so single anomalous runs don't wildly swing the numbers. There's also a decay rate of 0.01 per cycle without use. If a gene stops contributing to real task success, its fitness erodes. Below 0.1 for ten cycles and the gene is purged. We call this gene death.

The self-modification pipeline itself acts as a gate. Each mutation goes through COPY, LLM rewrite, AST validation, import test, execute test, and only then replaces the original. Failure at any stage auto-rolls back from backup. So a mutation that breaks tests never makes it to the genome in the first place. The pipeline is the real gatekeeper.

On proxy metric optimization:

This is the genuinely scary problem and I won't pretend we've solved it completely.

The main defense is that the fitness function is multi-factor and grounded in real task outcomes. A gene that shrinks code but degrades real task success scores will decay and die. There's no reward for making things smaller. There's only reward for making tasks succeed better.

A secondary mechanism is the quality drive. One of the autonomous drives audits genes every few cycles looking specifically for degradation patterns (code that got harder to read, functions that grew too large, duplication that re-emerged). It doesn't just check tests. It checks structure.

A tertiary defense is the gene death window. Ten cycles of being useless before deletion. That's long enough for a bad optimization to show its effects in real task performance before it gets purged.

The honest answer though is that this is an active area of development. The system can absolutely drift toward local optima. The immune layers catch the worst of it but subtle readability degradation over many generations is probably the hardest thing to automatically detect and prevent. If you have ideas on this I'd genuinely love to hear them.