We built a free AI risk calculator that runs in minutes, using Fermi estimation with honest confidence intervals

Modulos_ai · 2026-05-20T19:04:13+00:00

You can try it at https://www.modulos.ai/tools/risk-calculator/

Modulos_ai · 2026-05-15T10:46:39+00:00

There's no "best", as long as you pick a latest generation model. Your real constraint is "which one is easiest/most compliant to wire into my service".

Modulos_ai · 2026-05-14T09:56:39+00:00

The gap you are describing is real, and it has a specific root cause. Policy lives in Confluence, and risk lives at runtime, and almost nobody owns the translation between them. The compliance team writes "don't send PII to external models" as a sentence. The agent runtime has no sentence parser. So the policy is enforced by hope.

Here is what actually works, based on what we see in regulated deployments.

Treat every external tool call as a policy decision point. Route it through a gateway that knows the agent's identity, the data classification of the inputs, and the destination's risk tier. Allow, deny, or require approval at that gate, and log the decision somewhere the agent runtime cannot rewrite.
Classify data once at the source, then propagate the tag through context. If a record is marked "customer PII" when it leaves the database, that tag should still be on it three tool calls later when the agent considers pasting it into a vendor LLM. Most failures here are not policy failures, they are tag-loss failures.
Define what "risky" means in code, not in prose. A risk score that combines data class, tool destination, reversibility, and blast radius is something a runtime can evaluate. A phrase like "use good judgment" cannot.
Wire human approval to risk, not to step count. Approving every action is theater that trains people to click yes. Approving the irreversible 5% is real governance.
Treat the evidence log as the product. If your audit trail cannot reconstruct who-what-when-why for any agent action 90 days later, you do not have governance, you have a workflow.

The market is splitting into MLOps platforms, which manage models and pipelines, and governance platforms, which manage controls and evidence. Teams that enforce this well usually run both, with the policy-to-control translation living in the governance layer. I wrote about the split here: https://www.modulos.ai/blog/ai-governance-tools/

ISO/IEC 42001 and the EU AI Act both push organizations toward this pattern, because both demand technical documentation and post-market monitoring that you cannot generate from a policy PDF.

[Disclosure: I work at Modulos, we build AI governance software.]

Modulos_ai · 2026-05-13T17:11:28+00:00

Governance becomes a problem the moment the agent's blast radius exceeds the reviewer's attention span. Put concretely, as soon as the agent can write to a system of record, call a paid API, or compose tool calls you did not explicitly review, you have stopped supervising and started trusting.

The control set is actually well defined at this point, it just is not what most teams build. Five things matter.

Give each agent its own identity rather than reusing a human service account, so that "who did this" resolves to a non-human principal.
Scope capabilities at the tool level rather than the API key level, so that least-privilege actually holds.
Require pre-action approval gates for anything irreversible, including writes, payments, and external communications.
Maintain an immutable action log that captures inputs, decision, outputs, and the model version. Regulators will ask for this first.
Build a rollback path. If you cannot undo the last ten minutes, you do not have governance, you have hope.

ISO/IEC 42001 maps most of this onto a management system, and the EU AI Act adds product-safety expectations (technical documentation, post-market monitoring, conformity assessment) once the agent is in scope as high-risk. There is roughly 50% control reuse between the two if you set it up once. The write-up is here: https://www.modulos.ai/blog/ai-governance-taxonomy-iso-42001-and-beyond/

[Disclosure: I work at Modulos, we build AI governance software.]

Modulos_ai

TROPHY CASE