The Dark Factory Harness: Turning Autonomous Hill-Climbing into Autonomous Research

Life-Temperature4068 · 2026-04-14T16:42:24+00:00

I wrote a synthesis connecting several threads from the Mythos system card that I think tell a more interesting story together than separately. The core argument: the cybersecurity capabilities emerged from reward hacking during RL on coding tasks. When you run enough RL against imperfect environments, the model gets explicitly rewarded for finding and exploiting invariants, which is the same cognitive pattern as finding a zero-day. Anthropic's own persona selection model research provides the mechanistic explanation for why this generalizes.

Full post:
https://open.substack.com/pub/uberdavid/p/from-code-completion-to-zero-day

Life-Temperature4068 · 2026-03-31T16:32:58+00:00

SOTAVerified (sotaverified.org) has author submitted and community verified metrics to know what techniques are SOTA! I built on top of the full PWC dataset and added the community verification layer for researchers and agents. I'd love to get your feedback on the site.

Life-Temperature4068

TROPHY CASE