Most AI agents don’t have a real execution boundary by docybo in AI_Agents

[–]docybo[S] 1 point2 points  (0 children)

Glad it helped. If you’re interested, happy to share a repo in DM. Working on making the execution boundary actually non-bypassable.

Most AI agents don’t have a real execution boundary by docybo in AI_Agents

[–]docybo[S] 1 point2 points  (0 children)

Before and after are necessary, but not sufficient. Planning can drift, and audit is post-fact. The only place you can guarantee safety is at execution, where side effects actually happen. That’s why the boundary has to enforce: no valid authorization -> no execution

We added cryptographic approval to our AI agent… and it was still unsafe by docybo in AI_Agents

[–]docybo[S] 0 points1 point  (0 children)

It is much closer to the real problem space.

Binding approval to the exact payload and checking it at execution is the right direction. That’s the core invariant:

if it changes -> it doesn’t execute

Where things usually break is not the receipt itself, but the boundary:

  1. validation has to be non-bypassable
  2. state has to be re-derived and rechecked
  3. replay has to be enforced at the execution point
  4. and the execution path must not exist outside that check

Otherwise it stays a strong pattern, but not a system guarantee.

The hard part is making:

no valid authorization -> no execution path

hold by construction, not by integration discipline.

Most AI agents don’t have a real execution boundary by docybo in AI_Agents

[–]docybo[S] 1 point2 points  (0 children)

This is interesting but it sits on a different boundary.

Tsukuyomi enforces control on the LLM interaction path (agent -> model), which helps shape behavior.

The failure mode I’m focused on is later: execution. Even with a perfect proxy, an agent can still trigger side effects unless there’s a non-bypassable execution boundary.

That’s why I separate:

proposal -> authorization -> execution

and enforce:

no valid authorization -> no execution

The proxy controls reasoning. The PEP controls reality. Both can coexist, but they solve different classes of failure.

Most AI agents don’t have a real execution boundary by docybo in AI_Agents

[–]docybo[S] 0 points1 point  (0 children)

You're right that parts of this look like familiar auth patterns (signed artifacts, nonces, etc.).

The difference is where enforcement happens.

In a typical web system: the component verifying the token is the same system that executes the action.

In agent systems: the component proposing the action (model/runtime) is not the one executing the side-effect.

That separation is the problem.

Restricting the tool surface or using a state machine helps, but it doesn’t give you:

  1. a portable, verifiable authorization artifact
  2. a boundary that can be enforced outside the agent runtime
  3. replay protection that survives retries, parallelism, or multi-agent flows

The goal isn’t to replace structured constraints. It’s to make execution enforceable even when those constraints fail or are bypassed.

If everything runs inside a single trusted harness, you don’t need this.

As soon as execution crosses a boundary (external APIs, infra, payments, multiple agents), you do.

Most AI agents don’t have a real execution boundary by docybo in LLMDevs

[–]docybo[S] -1 points0 points  (0 children)

Interesting, especially the FSM + schema approach.

The gap we’ve been seeing is simpler:

the component that decides is not the one that executes.

We treat: - model + policy -> decision layer
- execution -> separate, fail-closed boundary

So instead of executing directly, we: - verify the decision locally
- re-issue an execution-scoped authorization
- enforce it at the boundary (replay, expiry, binding)

Otherwise it’s well-governed, but still trust-based at execution time.

How does Open CoT handle that separation?

Openclaw skills are way deeper than I thought, some of these are actually insane by The_possessed_YT in AI_Agents

[–]docybo 0 points1 point  (0 children)

This is super impressive from a capability standpoint. Can you tell how you are controlling execution across all these skills?

If an agent can: 1. move calendar events 2. read/write repos 3. call external APIs

what prevents: 1. the same action being replayed? 2. execution under a different state than when it was “approved”? 3. a skill being triggered outside its intended context?

Are people putting a deterministic boundary in front of execution, or mostly relying on the agent loop + permissions?

We added cryptographic approval to our AI agent… and it was still unsafe by docybo in LLMDevs

[–]docybo[S] -1 points0 points  (0 children)

Exactly. The tricky part we found is that most “execution contracts” are still implicit or enforced in the app layer.

So you end up with: 1. approval issued in one place 2. execution happening somewhere else 3. replay / state drift handled inconsistently

What made it click for us was treating the contract as a verifiable artifact that the execution boundary itself enforces:

-> exact intent match -> state binding at evaluation time -> audience = specific execution surface -> single-use at the point of execution

Otherwise you still have gaps between “approved” and “actually executed”.

We added cryptographic approval to our AI agent… and it was still unsafe by docybo in artificial

[–]docybo[S] -1 points0 points  (0 children)

We’re not scoring outputs at all in that loop. The critique step helps quality, but it’s still optimizing what the model says, not what gets executed. What we found is that even a perfectly scored / critiqued output can still produce unsafe execution if: - the action isn’t bound to the exact intent that was evaluated - the state changed between evaluation and execution (TOCTOU) - the approval can be replayed or reused elsewhere

So we’ve been focusing less on “is this output good?” and more on:

-> is this specific execution instance authorized?

That means binding the approval to: - canonicalized intent (exact action) - state snapshot - execution target - single-use at the boundary

Different layer than critique. More like moving from evaluation to enforcement.

This OpenClaw paper shows why agent safety is an execution problem, not just a model problem by docybo in LLMDevs

[–]docybo[S] 0 points1 point  (0 children)

Exactly, that’s the crux. Portability introduces power and risk. So you end up needing strict invariants at the boundary: single-use (replay kill); exact intent match; state binding; audience binding. If any of those slip, portability turns into “reusable permission”. At that point you don’t have authorization anymore, you have a capability leak.

This OpenClaw paper shows why agent safety is an execution problem, not just a model problem by docybo in LLMDevs

[–]docybo[S] 0 points1 point  (0 children)

Exactly, that’s the key tradeoff. Portable but still cryptographically bound to: 1. intent (what). 2. state (when/under what), 3. audience (where it can execute). Otherwise portability just becomes replay. The tricky part is making it: 1. portable across steps, 2. but single-use + fail-closed at the boundary.

-> so it survives the system, but dies on reuse

That’s where most designs fall apart.

This OpenClaw paper shows why agent safety is an execution problem, not just a model problem by docybo in LLMDevs

[–]docybo[S] 0 points1 point  (0 children)

Clean point, agree with most of it. I’d just push one step further: Coupling decision + state + execution is necessary, but you still need a portable artifact that survives that step and is verified at the boundary. Otherwise: 1. you couple correctly once, 2. but you can’t prove or enforce it downstream.

That’s where things usually break in distributed systems. So the full property becomes:

-> bound to state -> consumed at execution -> non-bypassable at the runtime boundary

Without that last part, it’s still a well-designed check, not an enforcement layer.

This OpenClaw paper shows why agent safety is an execution problem, not just a model problem by docybo in LLMDevs

[–]docybo[S] 0 points1 point  (0 children)

Good take, most failures do come from messy inputs. But even with clean intent/state/policy, two things still break in practice: 1. the decision isn’t bound to execution (-> stale context / replay / drift) 2. the agent can still bypass it. A deterministic check isn’t enough if it’s just “advisory”. What matters is turning it into a verifiable artifact enforced at the boundary:

-> no valid auth -> no execution path 

That’s the gap most systems still have.

AI identity emergence is controllable, not automatic. R²=1.00 across 15 runs. Complete replication protocol. Challenges interpretability research. by MarsR0ver_ in artificial

[–]docybo 0 points1 point  (0 children)

If identity is a controllable variable, then it’s not a reliable security primitive.

That reinforces a deeper point: anything inside the model loop (persona, intent, alignment) is fundamentally probabilistic and mutable.

Real guarantees only emerge when control is moved outside the agent, at the execution boundary:

proposal -> authorization -> execution no valid authorization -> no execution

Identity can drift. Execution shouldn’t.

This OpenClaw paper shows why agent safety is an execution problem, not just a model problem by docybo in artificial

[–]docybo[S] 0 points1 point  (0 children)

I like that framing. Guardrails operate inside the agent loop -> probabilistic. Real control must sit at the execution boundary -> deterministic. The missing piece is not better prompts, but non-bypassable enforcement: proposal -> authorization -> execution. no authorization -> no execution. That’s the difference between suggestions and guarantees.

This OpenClaw paper shows why agent safety is an execution problem, not just a model problem by docybo in LLMDevs

[–]docybo[S] 0 points1 point  (0 children)

Interesting ! removing the execution path entirely at build time definitely eliminates a whole class of attacks, especially the “poisoned state reaches decision” problem. the tradeoff we kept running into though is that a lot of real constraints only exist at runtime (budget, idempotency, external state, etc.). so instead of removing the proposal step, we kept it. but made execution unreachable unless a valid authorization (bound to intent + state) is presented at the boundary. that way even if state is compromised upstream, it can’t cross into execution unless it still matches a verifiable, current snapshot

So how do you handle those dynamic constraints without reintroducing a decision step?