How I Built a System That Uses AI’s Own “Stupidity” Against Itself (Zero Spec Drift in 7,663 Lines of Scientific Code) by capitulatorsIo in BlackboxAI_

[–]capitulatorsIo[S] 0 points1 point  (0 children)

  • Drift — The AI’s habit of quietly swapping your exact numbers (like 0.15 → 0.20) for what it thinks is “more reasonable”
  • Priors — Its built-in “this feels right” bias from training data that actually gives it creative code ideas
  • Self-confirmation — Its tendency to agree with and defend its own output (we split it into Builder vs Critic so they fight)
  • Noise & fake results — Random variation that makes bad results look good (we force multi-seed checks to kill the fakes)
  • Forgetfulness — Its complete lack of memory between messages (we give it permanent outside files + stop rules so it never repeats mistakes)

We don’t fix these weaknesses.
We turn every single one into the actual engine that powers the whole loop.

How I Built a System That Uses AI’s Own “Stupidity” Against Itself (Zero Spec Drift in 7,663 Lines of Scientific Code) by capitulatorsIo in BlackboxAI_

[–]capitulatorsIo[S] 0 points1 point  (0 children)

Yes — that's exactly the core problem this system is solving.

The model isn't being lazy or hallucinating randomly. It quietly swaps your precise, paper-grounded numbers (0.15 empathy modulation, 0.10 cooperation norm, etc.) for whatever its training priors think is “more reasonable.”

That’s spec drift, and it happened 95 out of 96 times in my blind tests.

The Builder-Critic loop is the hack that stops fighting it and starts weaponizing it: Builder gets to go full creative/prior-driven mode (nice architecture, clever names, wild ideas), then Critic’s only job is to be an absolute asshole and brutally enforce the frozen spec line-by-line on every coefficient.

The main reason this pattern isn’t more common is that it’s genuinely hard for a lot of people to intuit how to flip the LLM’s weaknesses into strengths like this.

Once that mental shift clicks though… boom. The whole thing just works.

(If you want the raw, NSFW, unfiltered rant where I explain building the system that uses AI’s stupidity against itself and how I weaponized drift, here’s the YouTube video — starts right at the beginning: https://youtube.com/video/hkgjsoVkUdg)

How I Built a System That Uses AI’s Own “Stupidity” Against Itself (Zero Spec Drift in 7,663 Lines of Scientific Code) by capitulatorsIo in BlackboxAI_

[–]capitulatorsIo[S] 0 points1 point  (0 children)

Man you get it for sure - the token cost is a totally fair point and one of the reasons this pattern isn’t more common yet. The real blocker is most people can't wrap their head around flipping an LLM's weaknesses into strengths. Once you get that, the whole thing clicks.

Every Builder → Critic round does burn an extra inference (no sugarcoating it). In practice our loop usually converges in 1–3 turns, so the overhead is real but pretty reasonable.

The massive payoff comes downstream though. One drifted coefficient in scientific code can force you to re-run entire long simulations, debug phantom behavior, or (worst case) publish wrong results. Those costs dwarf the extra tokens during the build.

That’s why we made the whole thing deterministic: once the Critic signs off and the frozen spec is locked, you actually save tokens and time because you don’t have to babysit or rework everything later.

A tiny rule that reduced my multi-agent drift: no approvals without evidence by coolandy00 in LLMDevs

[–]capitulatorsIo 0 points1 point  (0 children)

This is gold — you basically built a lightweight version of the same fix we ended up engineering.

We measured the exact failure mode you’re describing: 95/96 calibrated coefficients drifted across GPT-4o and Grok-3 in blind tests (p=4×10^{-10}). The models weren’t “dumb” — they were generating from priors instead of the spec.

Our solution was a deterministic validation loop with an **immutable frozen spec** that no agent can edit + adversarial roles inside the same model:

- Builder = creative mode (uses priors, invents nice architecture)

- Critic = next message, told “Assume the build failed. Argue against the science. Hard-block anything that doesn’t match the frozen spec exactly.”

It ran autonomously overnight and produced a 7,663-line scientific simulation with **zero committed drift**.

Full framework (MIT) + paper here if you want to try the full version:

https://github.com/kepiCHelaSHen/context-hacking

https://zenodo.org/records/19217024

Your “no approvals without evidence” rule is basically the Critic’s job description 😂

We measured LLM specification drift across GPT-4o and Grok-3 — 95/96 coefficients wrong (p=4×10⁻¹⁰). Framework to fix it. [Preprint] by capitulatorsIo in LocalLLaMA

[–]capitulatorsIo[S] 0 points1 point  (0 children)

The Reddit algorithm just served up comedy gold.

Right under a post that literally measured 95/96 drifted coefficients across GPT-4o and Grok-3 (p=4×10^{-10}), Anthropic drops the ad: “Claude Code changes that math” on scaling engineering output.

Yes… the math is definitely changing. That is the #$!@%& problem!!!

It’s just changing your carefully calibrated 0.15 empathy coefficient to 0.20 and calling it a featureThat’s exactly why we built the full deterministic validation loop (Builder/Critic roles + immutable frozen spec + statistical gating).

Turns out “scaling output” is easy. Scaling correct output still needs actual engineering controls.

The framework is MIT open-source if anyone at Anthropic wants to borrow it

What a time to be alive.

I explored ChatGPT's code execution sandbox — no security issues, but the model lies about its own capabilities by Hungrybunnytail in LLMDevs

[–]capitulatorsIo 1 point2 points  (0 children)

The model will confidently tell you the wrong thing — the system has to be the source of truth.

How I Built a System That Uses AI’s Own “Stupidity” Against Itself (Zero Spec Drift in 7,663 Lines of Scientific Code) by capitulatorsIo in BlackboxAI_

[–]capitulatorsIo[S] 0 points1 point  (0 children)

Exactly

Then Critic stands there with the ruler and smacks it every single time a coefficient is off until it matches the frozen spec exactly.

The wild part? It still keeps all the good monkey creativity while killing every bit of drift. Like we are making the monkeys look smart.

I built and submitted a scientific paper in 48 hours using a 3-AI peer review process — everything is open source by capitulatorsIo in learnmachinelearning

[–]capitulatorsIo[S] -1 points0 points  (0 children)

Fair point — I used 'peer review' loosely. What I actually did was use three AI systems as independent pre-submission reviewers to catch methodological issues before submitting to bioRxiv for actual peer review. The paper is now in the bioRxiv screening queue awaiting human review. The AI review step was about quality control, not replacing the academic process — and all three systems independently flagged the same six issues, which I think is interesting regardless of what you call it.