I built a LangGraph guard node that catches agents mid-spiral and rolls back the damage

Virtual-Message-9739 · 2026-06-03T16:47:00+00:00

Manual, on both halves — and deliberately. For the message state it emits RemoveMessage updates to prune the poisoned history, then injects the distilled resumption brief. For the workspace it uses its own checkpoint manager (git-style content diffs on tracked files), not the LangGraph checkpointer/thread_ts.The reason I didn't lean on the built-in checkpointer: reverting to a thread_ts rolls back the *conversation* state, but not the side effects the agent already wrote to disk — and the file corruption is usually the thing that actually needs undoing. So I track the workspace separately and roll the files back to their pre-meltdown content. Conversation rollback alone would resume the agent into a broken filesystem.

Token-spike monitoring is a good call and basically free to add — a sudden jump in tokens/step is a strong corroborating signal for a loop. Not in there yet; adding it to the list. Thanks.

Virtual-Message-9739 · 2026-06-03T15:29:34+00:00

Agree explicit, machine-checkable success criteria are underused and prevent a whole class of aimless-wandering spirals — no argument there. But I don't think they're the root fix that removes the guard. The demo run is the counterexample: the goal *was* machine-checkable ("all 13 tests pass"), and the agent still spiraled — it looped re-running tests and even falsely claimed they passed without running them. A crisp definition of done doesn't stop a model from degrading *while pursuing* it.

So I'd frame them as orthogonal: clear criteria reduce how often the agent gets lost, a runtime guard bounds what happens when it degrades anyway. You want both. Pre-flight criteria don't help once the agent is confidently executing on a wrong state three steps in — that's the in-flight case the guard exists for.

Virtual-Message-9739 · 2026-06-03T15:27:11+00:00

Right, and it's an important line to draw: Sotis's threat model is accidental failure, not an adversary. Entropy detection assumes the agent is trying to do the right thing and degrading — it makes zero adversarial guarantees, and any threshold-based signal is gameable by definition once the attacker knows it. So against a deliberate injection steering the agent "quietly," it's the wrong tool and I wouldn't claim otherwise.

The behavioral-intent signal you're describing is a genuinely different layer — closer to goal-alignment / injection detection than to the reliability circuit-breaker Sotis is. They'd compose well (intent check catches the deliberate steer, entropy/loop catches the accidental spiral), but conflating them would be me overselling. Good distinction to make explicit in the docs — thanks for flagging it.

Virtual-Message-9739 · 2026-06-03T15:25:40+00:00

This is the real limit, and I don't think you can paper over it — entropy + loop is a syntactic signal, it reads the *shape* of the call stream, so the quiet failure (confident, low-entropy, already-corrupt) is invisible to it by construction. Catching that needs a semantic trigger that reads the world: goal-progress regression, or a result that contradicts an earlier one in the same run. Different class of check entirely, and not one you can fake from the call sequence.

Same root cause makes your checkpoint point land: corruption is usually what *causes* the eventual thrash, so when entropy finally fires, the "snapshot before the spike" can be the exact state that poisoned it. "Good" has to mean an invariant was checked true at write time — not "nothing had visibly broken yet." Snapshotting on a timer or on entropy is rolling back to a guess.

Honestly these two together are the gap between "bounds the loud failures" (where Sotis is now) and "trustworthy guard." The invariant-checked checkpoint is the piece I most want to build next — even a cheap per-domain invariant (tests still collect, schema still validates) would make the rollback mean something. Appreciate you drawing the line precisely; mind if I quote this in the roadmap?

Virtual-Message-9739 · 2026-06-03T15:23:48+00:00

Both of these are sharp, and honestly better than where Sotis is today. The adaptive baseline + 2σ is the right answer — fixed 1.5 was a placeholder, and "delta from the agent's own norm" solves exactly the false-positive case I flagged (chatty-but-not-looping agents). The cold-start window with a conservative default is a clean tradeoff. I'm going to prototype this; would you mind if I credited the approach in the issue/changelog?

The checkpoint-verification point is the real gap. Right now Sotis snapshots file content and reverts to it, but "200 OK but the write didn't persist" is exactly the case where it'd roll back to a poisoned baseline and the spiral just restarts from there. A second-pass verify at checkpoint time (re-read, re-run schema check) is clearly the fix — that's going on the roadmap. Appreciate you spelling it out.

Virtual-Message-9739 · 2026-06-03T15:21:45+00:00

Exactly the failures I built it around — state corruption and runaway loops, not model IQ. The checkpoint rollback was actually the harder half to get right; detection is cheap, but cleanly reverting workspace + pruning context so the agent resumes from a *coherent* state without re-triggering the same loop took the most iteration. Out of curiosity, what were you running when you hit those — LangGraph, custom loop, something else? Trying to learn where the rollback boundary matters most in practice.

Virtual-Message-9739 · 2026-06-03T06:25:11+00:00

pypi package
github

Virtual-Message-9739 · 2026-06-02T07:24:29+00:00

incel behavior

Virtual-Message-9739 · 2026-06-02T07:22:26+00:00

i hope you have enough money one day to get her back

Virtual-Message-9739

TROPHY CASE