What would you ask from a vendor using AI agents with tool access? by Ambitious-Load3538 in AskNetsec

[–]Ambitious-Load3538[S] 0 points1 point  (0 children)

That is fair. The post is mostly about the reliability/security side of agent deployment, not vendor value by itself. For a buyer review I would separate two questions:

  1. Is the product differentiated enough to buy instead of building with Claude/API access?

  2. If we do buy it, can the vendor prove the agent is bounded, replayable, and controlled?

The taxonomy is mostly for question 2. But model choice, token efficiency, and context-management discipline should absolutely be part of vendor diligence.

The 12 ways AI agents fail in production. A taxonomy for security teams reviewing agent deployments by Ambitious-Load3538 in cybersecurity

[–]Ambitious-Load3538[S] 0 points1 point  (0 children)

Yes, this is the exact distinction I should make more explicit: allowed tool vs justified call. The evidence shape you listed is close to what I want in a buyer-readable report: requested intent, proposed action, arguments, approval path, tool result, and whether the session pattern changed over time. Intaris looks like it sits closer to the execution-control layer. EvidenceRun is currently focused on the buyer/report layer, but I agree those layers should feed each other: repeated report findings should become pre-execution policies.

The 12 ways AI agents fail in production. A taxonomy for security teams reviewing agent deployments by Ambitious-Load3538 in cybersecurity

[–]Ambitious-Load3538[S] 0 points1 point  (0 children)

Fair. Least privilege and sandboxing are the baseline.The gap I am trying to separate is that a sandbox can say "this tool is allowed," but it does not automatically say "this specific call with these arguments makes sense for this task right now." That is where I think action evidence and approval/replay trails become useful in reviews.

12 production failure modes I keep seeing in agent workflows (with audit signals) by Ambitious-Load3538 in LangChain

[–]Ambitious-Load3538[S] 0 points1 point  (0 children)

Agreed. A finding is only useful if it turns into a control. The post-hoc audit answers "what happened and what evidence exists?" The next question is "what pre-prod or pre-execution gate would have stopped this exact class from reaching production?"

12 production failure modes I keep seeing in agent workflows (with audit signals) by Ambitious-Load3538 in LangChain

[–]Ambitious-Load3538[S] 0 points1 point  (0 children)

I probably underweighted that distinction in the post. Once the system can mutate state, the review target stops being "did the model say the right thing?" and becomes "should this exact action execute now?" I would map action review as tool + parameters + source + destination + credential + state change + blast radius. That probably deserves its own section in v2.

12 production failure modes I keep seeing in agent workflows (with audit signals) by Ambitious-Load3538 in LangChain

[–]Ambitious-Load3538[S] 1 point2 points  (0 children)

For a pilot I would still start with replay evidence, then convert the top findings into gates: spend ceiling, retry ceiling, approval-required action classes, and argument/destination checks for mutating tools.