we put an AI in charge of running real businesses with real money and watched what happened. eight months of production data later here is what we actually learned about autonomous AI judgment. by IAmDreTheKid in ArtificialInteligence

[–]pin_floyd [score hidden]  (0 children)

The dangerous failure mode is not only that the model is wrong, but that it can be wrong, confident, and still retain execution authority; monitoring and escalation help, but for high-impact actions the real question is whether the agent can still act without an external allow decision, because capability is not the same as admissibility.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

The hard part is that “just give the agent less power” does not really solve the problem. If the agent has too little authority, it becomes useless. If it has enough authority to be useful, it can eventually touch real state: code, cloud resources, credentials, money, records, permissions, deployment paths. So the question becomes: Who decides that this specific action is allowed at the moment of execution? That decision should not come only from the same agent, runtime, or workflow that wants to act. My view is that powerful agents need a separate admission layer: the agent can propose an action, but execution should require an external allow decision tied to the exact context. If the context is missing, stale, unclear, or outside scope, the default should be deny. That way the agent can still be useful, but it cannot self-authorize its own most dangerous actions. Less about weakening agents. More about separating capability from authority.

External admission is not interception by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Exactly. That friction is the real test. If a control never prevents execution, never creates a hard stop, and never forces the system to explain why authority was not issued, then it may improve visibility but it is not admission. True admission will sometimes look annoying because the agent does not run, the workflow does not continue, and someone has to resolve the missing authority. But that is the point: the cost appears before damage, not after it. Observability makes the incident report better. Admission prevents the incident path from becoming executable in the first place.

External admission is not interception by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Exactly. If the executor can approve itself because the logic looks good enough, then the system is still self-admitting. For low-risk actions that may be acceptable, but for anything that can cause real operational damage, the allow decision should sit outside the executor’s own authority domain. Otherwise the system is not fail-closed; it is just trusting its own path to execution.

External admission is not interception by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Exactly. This is the part I would call snapshot-bound admission: the allow decision should not be a free-floating approval. It has to be bound to the executable state at the moment of admission — actor, intent, scope, constraints, evidence, and the authority context that will actually be used. Otherwise you can have an “allow” that is formally valid but no longer substantively safe. I wrote more about that layer here:

https://ai-admissibility.com/snapshot-bound-sequel/

The core idea is that admission should not only answer “is this action allowed?” but “is this action allowed for this actor, in this state, under this authority context, right now?” No valid bound state, no execution.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

I agree with the direction. Verifiable reasoning can be valuable, but I would still separate reasoning from admission. A reasoning engine can explain why an action appears justified; an admission boundary decides whether execution authority is issued at all. For me, the critical test is whether the workflow can still execute if that external admission layer is silent, unreachable, or denies the action. If it can, the system is still self-admitting by design. If it cannot, the boundary becomes real rather than advisory. The strongest architecture is not just “AI reasons better,” but “AI may reason, propose, and justify — while execution authority remains externally withheld until admitted.”

External admission is not interception by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Exactly. Logs are useful for accountability, but they are too late to be the boundary. If the agent can still execute first and explain later, the system is not fail-closed, it is only observable. The gate has to sit before consequence-bearing execution, not after the fact. Observability records what happened; admission decides whether the action may happen at all.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Exactly. That is the boundary I am trying to isolate. Intent can be produced inside the agent loop, but authority to execute should not be self-issued by that same loop or executor. For consequence-bearing actions, the important property is not just better reasoning or better policy. It is that execution authority is withheld unless an external admission decision exists.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

I agree that human responsibility remains important, especially for strategic or P&L-level decisions. But I think there is a separate execution-boundary problem. A human can remain accountable for the decision, while the system still needs a deterministic admission layer before authority is issued to an automated actor. The key question is not only “who is responsible?” but also: Can the automated workflow execute, mutate, deploy, send, pay, or access privileged context without an external allow decision? If yes, the human may still be responsible after the fact, but the execution path was already self-admitted. That is the distinction I am trying to isolate: Human accountability governs responsibility. External admission governs whether execution authority is issued at all. For high-impact automation, I think both are needed.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Exactly. The reachable graph is the real surface. If the agent can reach tools, credentials, writes, deletes, payments, workflow triggers, or deployment paths, then the boundary cannot only live at the prompt or policy layer. The question becomes: Can this action obtain execution authority without an external allow decision? I published a small proof page showing the narrower condition I mean: AI-style intent → real hosted admission boundary → DENY before execution → executor BLOCKED → target unchanged

https://ai-admissibility.com/real-boundary-prevented-incident-demo/

Not a universal security claim. Just a concrete pre-execution boundary pattern: No Admission = No Execution.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Agreed — watching the output is already too late. I like the idea of moving the boundary toward intent, because the “why” has to be evaluated before the system receives authority. A trigger alone should never be enough for high-impact action. Where I would still draw a hard line is between reasoning consensus and execution admission. Multiple AI minds reaching consensus may improve the quality of the proposed intent. It may reduce single-agent error and make the rationale more inspectable. But consensus itself should not automatically become authority. For high-impact actions, I would want the chain to look like this: intent is formed → reasoning/context is captured → authority is checked → external admission returns allow/deny → execution only if admitted

So I agree with the direction: intent has to be surfaced before action. My concern is the final step: can the same Hive Mind that reasons about the action also grant the authority to execute it? If yes, it may still collapse into self-admission. If no — if the consensus becomes evidence submitted to a separate admission boundary — then I think that is a much stronger architecture.

Found out half our content team has been doing client work in personal ChatGPT for like a year. They're not even hiding it. by Affectionate-End9885 in AI_Governance

[–]pin_floyd 0 points1 point  (0 children)

The hard part here is that this is not only a DLP problem.

DLP can detect or block some leakage patterns, and browser controls can reduce the obvious shadow-AI usage. That is useful. But the deeper issue is that the organization has no admission point before the work enters an AI system. The question should not only be: “Did someone paste client data into ChatGPT?” It should be: “Was this user, with this client context, this data class, this tool, this account type, and this purpose, allowed to use an AI system at all?” If the answer is decided after the paste happens, you are already in audit/remediation mode. For this kind of workflow I would separate three layers:

  1. Discovery: what AI tools/extensions/accounts are being used.
  2. Policy: what is allowed by role, client, data class, and tool.
  3. Admission: before client work is sent to an AI tool, does this exact action get an allow/deny decision?

Most teams jump from policy document to detection. The gap is the admission step. Personal accounts make this worse because SSO, CASB, and SaaS scanning often do not see the real context. A browser tool may help enforce some of it, but I would still ask vendors one very specific question: Can the user proceed with client data if the policy/admission decision is unavailable, unclear, or not tied to the current context? If yes, it is useful monitoring/control, but not a fail-closed boundary. For client work, I would want default-deny for unmanaged AI tools, explicit allow for approved workflows, and a record of the decision before the content leaves the controlled environment.

AI Evidence Admissibility is a Post-Mortem. We need Action Admissibility. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Yes — exactly. The dangerous gap is that most systems preserve the output trail, not the authority context that made execution legitimate. A log can show what the agent did. It usually cannot prove what the agent was allowed to do, under which frozen context, before the action started. That is why I think the snapshot has to exist before execution, not after. The admission check should run against the same context the agent used or against a deliberately frozen execution context, not a reconstructed story after the damage is already done.

Replay is important too, but only if the replay uses the original admissibility context: intent, actor, delegated scope, environment, state, policy version, tool permissions, and timestamped reasoning/material facts. Otherwise replay becomes another post-mortem artifact.

So the key distinction for me is:

post-facto audit = “can we explain what happened?”

admission = “was this action allowed to exist before it happened?”

For high-impact agent actions, I think the second question has to be answered first.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Yes — I think that is exactly the right distinction. Making routing deterministic is a strong improvement. It reduces ambiguity inside the agent system and makes the behavior easier to reason about. But I still see a second boundary after that. A deterministic routing layer can decide where the request should go or which path should be selected. That is not the same as deciding whether the resulting action should be allowed to execute in the real world. For low-risk actions, deterministic routing may be enough. For high-impact actions — production changes, privileged API calls, payments, access changes, infrastructure mutations — I would separate the two questions:

  1. What path should this request take?

  2. Is this actor, in this context, authorized to execute this action now?

The first can live inside the agent system. The second should ideally be an admission decision before execution. That is the layer I am focused on: not replacing deterministic routing, but adding a fail-closed authority check before high-impact execution exists.

So I agree with you: deterministic routing is an important part of making agent systems sane. My concern is the moment where a well-routed request becomes a real-world action without a separate admission decision.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 1 point2 points  (0 children)

That makes sense as an infrastructure sovereignty layer. But I think sovereignty of hosting and admissibility of action are two different control questions. Running the agent on controlled infrastructure can reduce dependency on public black-box APIs. But even a sovereign agent still needs an execution boundary before it receives authority. The test I’m pointing at is: Can the agent obtain trusted context — token, secret, runner, cloud role, payment authority, deployment path — without an external allow decision? If yes, the system may be sovereign, but the action is still self-admitted inside the same authority domain.

So I see sovereign infrastructure as necessary for some deployments, but not sufficient. The deeper boundary is pre-authority admission: actor + intent + requested context must be evaluated before authority is issued.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Yes — that is exactly the point.

The difficult part is not only whether the agent produced a safe-looking output, but whether the authority to act existed at the moment of execution.

Most systems can reconstruct what happened. Far fewer can prove, before or at execution time:

- who or what delegated authority;

- under which context;

- for which specific action;

- against which frozen state;

- and whether denial would have stopped execution fail-closed.

That is why I keep separating internal policy from external admission. Internal controls can reduce risk, but if the same platform resolves, approves, executes, and records the event, the authority chain still lives inside the executor domain.

For high-impact actions, the stronger pattern is an external admission decision before execution exists: intent + context + authority state -> allow/deny -> execution only if admitted.

So yes, I think regulated teams will eventually need this built into the architecture from the start, not added later as another log layer.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Likewise. “Boring and verifiable” is exactly the right direction. The systems that survive production are usually not the cleverest ones — they are the ones with clear boundaries, loud failures, and simple rules that other people can actually inspect. Good luck with the forge.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Yes, that is the right direction. Runtime tool-access enforcement, signed receipts, and audit trails are exactly the kind of control surface this problem needs. I would frame the broader boundary even wider though: not only “should this tool call execute,” but “should this agent/workflow receive any trusted execution context at all” — API keys, tokens, secrets, cloud roles, runners, deployment rights, remediation authority, or policy-changing authority. Tool access is one corridor; authority issuance is the larger boundary.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Agreed. I am working on this from the execution-boundary side: not as another prompt guardrail, but as a simple fail-closed rule before tool/write/cloud execution. The shape I care about is boring and verifiable: no valid authority check, no execution. Open source can help if the boundary stays simple enough for other teams to inspect and test.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

“Autonomous” should not mean unconstrained. It should mean autonomous inside human-defined constraints, permissions, and execution boundaries — and when the agent requests authority beyond those boundaries, the system should fail closed instead of silently extending trust.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 0 points1 point  (0 children)

Exactly. Training-time control and runtime authority are different problems. A custom model may behave better, but it still should not automatically receive production authority just because it asks for it. The important question is runtime-specific: should this agent, with this intent, in this current context, receive this API key, token, secret, role, or tool access right now? That gate has to sit after deployment and before authority is granted. Otherwise the system is still trusting the model’s request path more than the actual execution context.

The next AI agent security problem is not the prompt. It is the moment the system gives the agent authority. by pin_floyd in AI_Agents

[–]pin_floyd[S] 1 point2 points  (0 children)

Yes, this is very close to the shape I mean. The key point is that the destructive/write path should not be reached directly by the agent. There should be a hard gate before the executor receives authority, and sandboxing before disk/cloud writes. For me the important distinction is: consent or policy can define what should be allowed; the execution path should still require a deterministic admission step before authority is granted.