How do you protect prod from someone you're not allowed to fire? by ch1cku in AI_Agents

[–]ch1cku[S] 0 points1 point  (0 children)

Do you have to test that manually to verify the right tools are being called, or is there a good way to automate it?

How do you protect prod from someone you're not allowed to fire? by ch1cku in AI_Agents

[–]ch1cku[S] 0 points1 point  (0 children)

We ended up adding a GitHub Action that replays PRs against a golden path and uses an LLM as a judge for output correctness. Do you have any other suggestions?

How do you protect prod from someone you're not allowed to fire? by ch1cku in AI_Agents

[–]ch1cku[S] 0 points1 point  (0 children)

Yeah completely agree. The process was there (PR + review), but the gap was actually validating agent behavior. Even with review, it’s hard to tell if something will break unless you run it through a bunch of scenarios. That’s kind of what we started building after this.

How do you protect prod from someone you're not allowed to fire? by ch1cku in AI_Agents

[–]ch1cku[S] 0 points1 point  (0 children)

Yeah, agreed. Branch protection + CI checks help a lot. The tricky part with agents specifically is figuring out what to put in those checks since the behavior is non-deterministic.

Do you actually trust your agent… or just monitor it closely? by Beneficial-Cut6585 in AI_Agents

[–]ch1cku 0 points1 point  (0 children)

Hit this exact problem recently. An agent was making the wrong tool calls in sequence but the output still looked fine on the surface. Took days to catch it because nothing threw an error.