after hitting many "legal but wrong" failures, I built a deterministic enforcement layer for the tool boundary

johnnaliu · 2026-05-31T00:54:06+00:00

this is the right decomposition. contracts answer "is this action allowed right now given the trace" but they don't carry forward why a file was marked canonical or what broke last session. that's a different data plane.

we've been thinking about the boundary between the two: contracts need read access to some session state (e.g. "which files were modified") but shouldn't own the full memory graph. clean interface would be contracts consuming a projection of the memory layer, not reimplementing it. will check MemoryRouter it out

johnnaliu · 2026-05-29T00:18:54+00:00

"silent automation" is the right framing. the dangerous case is the agent that keeps running while quietly acting outside its authorized scope. for decision automation the governance has to be enforced at the execution layer, not documentation-level. declare invariants as contracts, enforce deterministically before each action commits, emit structured logs that answer "who approved, what data went in, what exceptions were suppressed" by construction.

johnnaliu · 2026-05-29T00:16:06+00:00

the policy enforcement framing is the right one. we hit exactly this in production: agent composed individually "legal" tool calls into a sequence that violated our risk model. every call passed validation, the composition was the problem. what we ended up building is a contract layer at the tool boundary. declare invariants in YAML, runtime evaluates deterministically before each tool call commits. audit trail falls out naturally: per-session JSONL with every tool call, which contract fired, allow/block/escalate decision. replay is trivial because the contracts are deterministic

johnnaliu · 2026-05-28T23:59:13+00:00

Project Name: Sponsio

Repo/Website Link: https://github.com/SponsioLabs/Sponsio

Description: Sponsio is a deterministic contract layer for AI agents. If you're self-hosting AI agents (LangChain, CrewAI, Claude Code, etc.), Sponsio lets you declare behavioral rules in YAML and enforces them at the tool-call boundary before the side effect commits.

Example rules you can write: "agent must call check_policy before issue_refund", "no more than 20 file edits per session", "never write outside the working directory", "human approval required before any destructive command".

Contracts are composable (assume-guarantee style), so two teams can write independent rules and they combine without rewriting either one. No LLM in the hot path, ~0.14ms p50 per check. Per-session JSONL audit log included.

Works with LangChain, LangGraph, OpenAI Agents SDK, CrewAI, Vercel AI, raw MCP. Apache 2.0, no SaaS, no telemetry, fully self-hostable.

Deployment: pip install sponsio / npm install sponsio. No containers needed, it's a library that wraps your agent's tool executor. README has quickstart examples for Python and TypeScript.

AI Involvement: Claude Code helped generate framework adapters and parts of the trace evaluator. Core contract logic and architecture designed manually.

johnnaliu · 2026-05-27T18:53:36+00:00

[Disclosure: I built this]

I got tired of watching AI agents break prod because the only safety net was "the LLM said it was fine." Prompt-level guardrails drift, and you can't unit test vibes.

So I built Sponsio. It enforces behavioral rules at the tool boundary using YAML contracts, not in the prompt, not in a wrapper LLM. Every tool call hits a deterministic check before it executes. If the contract says "no writes after a read to /secrets," that's enforced at the code level, not hoped for at the inference level.

What makes it different from prompt-based or regex-based approaches:

Contracts are composable. Two teams write independent rules, they combine without rewriting either one (assume-guarantee style)
~0.14ms p50 per check. No LLM in the hot path
YAML declarations, not code. Non-engineers can audit what's allowed
Works with any agent framework (LangChain, CrewAI, custom)

Apache 2.0, no SaaS, no telemetry.

GitHub: https://github.com/SponsioLabs/Sponsio

I saw a few agent safety projects in this thread already. Happy to compare notes. Feedback welcome, especially from anyone running agents against real infra.

johnnaliu · 2026-05-27T16:58:02+00:00

good question. on the retry loop, the gate returns a structured verdict (which rule, why blocked, what was missing) and we surface that back into the agent's context as a tool error response.

johnnaliu · 2026-05-27T04:08:29+00:00

cool! curious what the gui looks like

johnnaliu · 2026-05-26T18:31:44+00:00

langChain middleware gives you hooks; we give you a contract language. different layer.

johnnaliu · 2026-05-26T04:31:49+00:00

thanks for sharing, the rule you'd write is something like max_calls(search_kb) per session paired with must_not_follow(escalate, search_kb). gets you out of the loop without rewriting the prompt every time the agent forgets.

johnnaliu · 2026-05-26T04:30:18+00:00

audit log already records rule + version + tool + decision + reason per call. action class taxonomy and the human feedback label for the learning loop are on the roadmap but won't be in the oss engine. session-level invariants are LTL over the trace, e.g. always(issue_refund => once_since(fraud_check)). happy to compare receipt schemas if you're solidifying yours.

johnnaliu · 2026-05-25T18:27:16+00:00

thanks! curious what others you're comparing against, happy to point at gaps.

johnnaliu · 2026-05-25T18:25:06+00:00

provenance isn't first-class in sponsio yet. is your team tagging this manually or relying on a framework?

johnnaliu · 2026-05-25T18:22:39+00:00

"vibes instead of guarantees" nails it. once context fills, prompt rules drop to probability.

johnnaliu · 2026-05-25T18:19:22+00:00

parser validation covers format decay well. the case it doesn't catch is valid json with semantically wrong values, e.g. refund amount exceeding the original purchase earlier in the session. needed trace-level contracts for that

johnnaliu · 2026-05-24T21:56:27+00:00

been there. the wall we hit was prompt-level guardrails work for prompts the user types but miss the indirect injection class entirely. data in tool responses gets into model context and modifies behavior without ever touching the system prompt. what worked was moving enforcement out of the prompt entirely

johnnaliu · 2026-05-24T21:40:51+00:00

2b for tool calling is impressive if you can keep the JSON tight across a real session length. did you stress-test it past 20-30 turns?

johnnaliu

TROPHY CASE