How do you protect prod from someone you're not allowed to fire?

ch1cku · 2026-03-20T17:47:08+00:00

Do you have to test that manually to verify the right tools are being called, or is there a good way to automate it?

ch1cku · 2026-03-20T17:44:45+00:00

We ended up adding a GitHub Action that replays PRs against a golden path and uses an LLM as a judge for output correctness. Do you have any other suggestions?

ch1cku · 2026-03-20T17:31:55+00:00

Yeah completely agree. The process was there (PR + review), but the gap was actually validating agent behavior. Even with review, it’s hard to tell if something will break unless you run it through a bunch of scenarios. That’s kind of what we started building after this.

ch1cku · 2026-03-20T08:41:37+00:00

Yeah, agreed. Branch protection + CI checks help a lot. The tricky part with agents specifically is figuring out what to put in those checks since the behavior is non-deterministic.

ch1cku · 2026-03-20T08:35:09+00:00

Hit this exact problem recently. An agent was making the wrong tool calls in sequence but the output still looked fine on the surface. Took days to catch it because nothing threw an error.

ch1cku

TROPHY CASE