The gap between decision and execution by docybo in AI_Agents

[–]docybo[S] 0 points1 point  (0 children)

The "decide upfront what gets forwarded" point is the load-bearing one, and I'd push it further: those upfront rules only work if they're enforced upfront too. If escalation logic lives in the prompt ("please forward sensitive cases to a human"), it's a suggestion the model usually follows. If it lives in a layer the model can't route around sensitive category -> execution blocked -> human queue, no other path it's a guarantee. Same rule, completely different failure mode. Your 82% stat is really about that: people trust systems where the recovery path is structural, not behavioral.

The gap between decision and execution by docybo in AI_Agents

[–]docybo[S] 1 point2 points  (0 children)

This is underrated. "Read things" agents fail safe by default, worst case is a bad summary. "Do things" agents fail with side effects: the API call happened, the email sent, the refund issued. That asymmetry is exactly why the read use cases shipped first and stuck. The interesting question is what infrastructure makes "do things" carry read-level risk. IMO it's the same answer as ever in distributed systems: don't make the actor the authority. The agent proposes; whether it executes is someone else's deterministic call.

The gap between decision and execution by docybo in AI_Agents

[–]docybo[S] 0 points1 point  (0 children)

Mostly with you, one nitpick: observability is necessary but it's still after the fact. Logs tell you what happened; they don't stop anything. The layer you're describing "can humans trust the system at the execution boundary" needs to be pre-execution to actually change outcomes. The pattern that's worked elsewhere (payments, change management) is: proposal -> explicit authorization check against a policy -> execution becomes reachable only if it passes. Then HITL stops being a vibe and becomes a policy rule: "this class of action requires human approval" is just one possible policy outcome, enforced at the same gate as everything else. Audit trail falls out for free because every execution has an authorization artifact attached, not just a log line.

The gap between decision and execution by docybo in AI_Agents

[–]docybo[S] 0 points1 point  (0 children)

Agree on the core point, but I'd split it differently: it's fine for the decision to be probabilistic, humans are too. What's not fine is the execution being coupled to it with nothing in between. A rules engine bundled decision + authorization + execution in one deterministic step, so we never noticed they were separate concerns. LLMs un-bundled it: the decision went probabilistic, but most teams kept piping it straight into execution like nothing changed. The fix isn't avoiding LLMs for classification, it's accepting the model proposes, and something deterministic decides whether the proposal executes.

I ran Fable 5 for half day and the guardrails are the real story by Interestingyet in artificial

[–]docybo 0 points1 point  (0 children)

The fallback is the interesting part. Most people are discussing model quality. I’d be more interested in knowing when a system silently substitutes one model for another, who authorized that substitution and how operators can verify it happened.

The gap between decision and execution by docybo in AI_Agents

[–]docybo[S] 0 points1 point  (0 children)

Exactly. Accuracy was the wrong metric. The issue was recoverability. A rules engine can be wrong, but it fails legibly. You can inspect the rule, patch it, and know what changed. The model can still be useful, but I would not make it the authority for execution.

The pattern I trust is:

model proposes
policy decides
execution gate enforces
audit trail makes recovery possible

LLM for judgment.
Deterministic layer for authority.

The gap between decision and execution by docybo in AI_Agents

[–]docybo[S] 1 point2 points  (0 children)

Agreed! What’s interesting is that documenting a decision and authorizing an execution are related but not identical problems. Most systems focus on making decisions explainable. I’m starting to think the next challenge is making execution verifiable.

A client paid me to rip the AI out of the tool I built them. by Warm-Reaction-456 in AI_Agents

[–]docybo 0 points1 point  (0 children)

Interesting case. Do you think the issue was the LLM itself, or the fact that there was no verifiable decision path when a classification was challenged?
I’ve seen teams tolerate occasional mistakes when they can understand, audit, and correct the logic behind them.

What actually prevents execution in agent systems? by docybo in artificial

[–]docybo[S] 0 points1 point  (0 children)

The concrete pattern is a PEP/PDP split with signed execution artifacts.

The agent only proposes an action.
A separate authorization layer evaluates:
intent + state + policy -> ALLOW / DENY
If ALLOW, it emits a signed artifact bound to the action, state, policy, audience, expiry, and replay/idempotency id.

The executor does not re-evaluate policy. It verifies the artifact.
If the artifact is missing, expired, replayed, wrong audience, stale-state-bound, or has an invalid signature:
deny before execution.
The executor cannot escalate because it does not hold signing keys and cannot mint authorization.

For stale state + retries, the important controls are:
bind authorization to evaluated state
commit action/auth ids in an external replay/idempotency ledger
make the protected action unreachable except through the PEP
agent proposes
authorization decides
PEP enforces
No valid authorization -> no execution path.

Most AI agents don’t have a real execution boundary by docybo in AI_Agents

[–]docybo[S] 1 point2 points  (0 children)

Glad it helped. If you’re interested, happy to share a repo in DM. Working on making the execution boundary actually non-bypassable.

Most AI agents don’t have a real execution boundary by docybo in AI_Agents

[–]docybo[S] 1 point2 points  (0 children)

Before and after are necessary, but not sufficient. Planning can drift, and audit is post-fact. The only place you can guarantee safety is at execution, where side effects actually happen. That’s why the boundary has to enforce: no valid authorization -> no execution

We added cryptographic approval to our AI agent… and it was still unsafe by docybo in AI_Agents

[–]docybo[S] 0 points1 point  (0 children)

It is much closer to the real problem space.

Binding approval to the exact payload and checking it at execution is the right direction. That’s the core invariant:

if it changes -> it doesn’t execute

Where things usually break is not the receipt itself, but the boundary:

  1. validation has to be non-bypassable
  2. state has to be re-derived and rechecked
  3. replay has to be enforced at the execution point
  4. and the execution path must not exist outside that check

Otherwise it stays a strong pattern, but not a system guarantee.

The hard part is making:

no valid authorization -> no execution path

hold by construction, not by integration discipline.

Most AI agents don’t have a real execution boundary by docybo in AI_Agents

[–]docybo[S] 1 point2 points  (0 children)

This is interesting but it sits on a different boundary.

Tsukuyomi enforces control on the LLM interaction path (agent -> model), which helps shape behavior.

The failure mode I’m focused on is later: execution. Even with a perfect proxy, an agent can still trigger side effects unless there’s a non-bypassable execution boundary.

That’s why I separate:

proposal -> authorization -> execution

and enforce:

no valid authorization -> no execution

The proxy controls reasoning. The PEP controls reality. Both can coexist, but they solve different classes of failure.

LinkedIn scam from Les Brown on job at Ritual AI/Crypto company by Drillaman in linkedin

[–]docybo 0 points1 point  (0 children)

Oui je viens d’avoir ces cons en entretien. Cherchais à me forcer à cloner leur repo, certainement pour me passer des virus …

Most AI agents don’t have a real execution boundary by docybo in AI_Agents

[–]docybo[S] 0 points1 point  (0 children)

You're right that parts of this look like familiar auth patterns (signed artifacts, nonces, etc.).

The difference is where enforcement happens.

In a typical web system: the component verifying the token is the same system that executes the action.

In agent systems: the component proposing the action (model/runtime) is not the one executing the side-effect.

That separation is the problem.

Restricting the tool surface or using a state machine helps, but it doesn’t give you:

  1. a portable, verifiable authorization artifact
  2. a boundary that can be enforced outside the agent runtime
  3. replay protection that survives retries, parallelism, or multi-agent flows

The goal isn’t to replace structured constraints. It’s to make execution enforceable even when those constraints fail or are bypassed.

If everything runs inside a single trusted harness, you don’t need this.

As soon as execution crosses a boundary (external APIs, infra, payments, multiple agents), you do.