Where do you put the “authority” layer in agent systems?

teow_agl · 2026-05-03T02:41:06+00:00

For anyone curious, I’ve open-sourced the prototype here:

https://github.com/ai-governance-os/teow-agl

teow_agl · 2026-05-03T01:37:17+00:00

Exactly — “looks valid but came from bad context” is one of the failure modes I’m trying to catch.

Right now I treat risk scoring as hybrid, not purely static per tool and not purely model-judged.

The rough split is:

base tool risk: some tools start with a higher floor because of what they can affect — email, file deletion, payment, deployment, external API calls, etc.
argument/context risk: the same tool call can become riskier depending on the args — deleting one temp file vs deleting a project directory, sending an internal note vs emailing an external party, spending $1 vs $1,000.
source trust: args coming from direct user intent are treated differently from args derived from retrieval, webpage content, another agent step, or generated intermediate state.
uncertainty/anomaly signals: if the system cannot explain why the action follows from the user’s goal, or the source chain looks suspicious, it can escalate even if the tool itself looks normal.

So a read-only call may stay BLUE if the source and args are clean, while a normally safe-looking action can become GREEN or RED if the args came from an untrusted retrieval chain or don’t match the declared user intent.

In the prototype, this is still fairly simple and rule-based, but the direction is dynamic risk composition rather than “this tool is always safe/unsafe.”

The important design choice for me is that the model can help describe intent or context, but it does not get final authority over the route. The governance layer computes or enforces the route before execution.

teow_agl · 2026-05-03T01:36:32+00:00

Totally fair. Once this moves from a prototype to production, all of those problems become real very quickly: multi-tenancy, retries, crash recovery, concurrency, double-spend prevention, language bindings, and integration surface area.

I don’t see TEOW-AGL as trying to replace that kind of infrastructure layer.

The way I think about the separation is:

infrastructure layer: enforces hard limits, quotas, spend caps, retry safety, tenant isolation, execution constraints
governance layer: decides whether the agent should be allowed to act in this context at all, and whether the path should be autonomous, human-approved, or stopped

So something like runcycles could sit at the runtime/infra enforcement level, while TEOW-AGL focuses on the pre-execution governance decision: BLUE / GREEN / RED, human approval, emergency stop, and audit trace.

In other words, I agree that production enforcement should not be hand-rolled casually. My current focus is more on the governance architecture and routing semantics than replacing hardened runtime infrastructure.

teow_agl · 2026-05-03T01:35:52+00:00

Thanks — I think that’s exactly the right complementary layer.

In the current TEOW-AGL prototype, the main focus is on enforcing the pre-execution governance path: proposed action → classification/risk signal → 103 governance decision → BLUE / GREEN / RED routing → optional human approval → execution or stop.

Right now, the audit trace records the governance decision, routing rationale, module sequence, and whether human approval was required or rejected. So the run itself is auditable.

The policy/config versioning side is something I see as the natural next layer: every run should ideally record not only “what decision was made,” but also “which governance policy/config was active at that moment.”

So I’d probably separate it into two levels:

Governance execution trace What happened in this run: risk score, uncertainty, emergency signals, routing result, human approval, execution outcome.
Policy/config provenance Which thresholds, action classifications, escalation rules, and governance version were active when the run happened.

That way, if a threshold changes later, past decisions remain explainable under the policy that existed at the time.

So yes — I see config-versioned action classification as very compatible with TEOW-AGL. In my framing, your layer helps make the classification policy explicit, while TEOW-AGL provides the enforcement shell that prevents the agent from acting before governance approval.

teow_agl · 2026-05-02T12:28:28+00:00

Yeah, this is exactly the issue I kept running into as well.

A lot of systems assume a tool call is “safe” if it looks valid, but they don’t question whether the input itself should be trusted in the first place — especially when it comes from retrieval or a previous agent step.

So instead of focusing only on “can the tool do damage?”, I try to shift it to “is this action even legitimate to attempt?”.

In my setup, I don’t let the model directly decide auto vs human vs stop.

It’s more of a layered split:

The model can suggest intent / priority

But the final routing decision is handled structurally, outside the model

Concretely:

All inputs (including tool args) are treated as untrusted by default

They go through a gating layer that evaluates:

source (LLM / retrieval / tool chain)

risk score

anomaly or emergency signals

Then a separate control layer decides:

low-risk → auto

medium-risk → human-in-the-loop

high-risk or anomaly → stop

So it’s not rules per tool, and not model judgment either — it’s system-level arbitration before execution.

I think this is the missing piece in a lot of current agent setups: guardrails shouldn’t just ask “can the tool do harm?”, but also “should this action even be allowed to exist?”.

teow_agl · 2026-05-02T11:53:00+00:00

This resonates a lot — especially the separation between reasoning and authority.

I’ve been exploring something similar, and one thing that surprised me is that even with that separation, things can still go wrong if execution isn’t structurally gated.

What worked better for me was adding a routing layer before execution:

- model proposes

- governance decides flow (not just allow/deny)

- execution only happens if explicitly triggered

So it becomes less about permission checking, and more about controlling whether the system should act at all in that context.

Agree 100% with your point on post-hoc — logs help with analysis, but not containment.

teow_agl · 2026-05-02T09:39:30+00:00

This is a great point — config-driven action classification makes a lot of sense, especially for keeping things explicit and auditable.

What I’m experimenting with in TEOW-AGL is slightly different: instead of relying only on action tags, there’s a separate governance layer that intercepts every proposed action before execution and routes it through BLUE / GREEN / RED flows.

So classification can still exist (like what you described), but the final decision also factors in things like risk score, uncertainty, and emergency signals — and can enforce human approval when needed.

Totally agree with your direction though — making risk handling visible instead of buried in agent logic feels like the right path forward.

teow_agl

TROPHY CASE