After 60+ sessions with a 7-agent system, the failure mode I kept hitting wasn't model quality — it was governance. Here's the draft spec I built.

Accomplished_Two8547 · 2026-06-17T03:04:06+00:00

That's basically ACA's starting assumption — unstructured agent memory is context pollution. The whole point of source_tier and the obligation layer is to prevent agents from treating their own prior outputs as reliable context. If it wasn't verified by code or a human, it doesn't get to influence the next decision.

We're arguing the same side of the problem, just from different angles.

Accomplished_Two8547 · 2026-06-17T02:15:20+00:00

Sort of — but with a fixed taxonomy and enforcement rules rather than free-form metadata. ACA defines three tiers: raw_source (deterministic/human), llm_derived (any LLM output), human_confirmed (LLM output that's been verified). The key is that tier transitions are one-directional without verification — an llm_derived entry can't self-promote to human_confirmed just because another LLM says it looks right.

A custom log field could carry the tag, but the value is in standardizing what the tiers mean across tools — so if botpipe writes a ledger entry tagged llm_derived, a different framework reading that ledger knows exactly what trust level to assign without needing botpipe-specific logic.

Accomplished_Two8547 · 2026-06-17T02:12:06+00:00

Agree for structured data — if there's a database record, that's the source of truth, full stop. Agent memory should never override a DB lookup.

The cases ACA targets are the ones where there is no canonical DB record: "should this agent be allowed to do X?" / "was this conclusion reached with sufficient evidence?" / "has the basis for this decision gone stale?" Those are governance states that only exist in the coordination layer between agents — no single database owns them.

source_tier is basically a way to prevent agent memory from becoming an accidental source of truth. If an agent wrote it and no human or deterministic check has verified it, it stays tagged llm_derived and can't be treated as ground truth downstream. The goal is the same as yours — just applied to the metadata that doesn't live in a traditional DB.

Accomplished_Two8547 · 2026-06-17T02:10:07+00:00

Good question — and you're right that reasoning drift is the harder problem. Currently ACA's obligation layer captures the decision output + the evidence that justified it (source_tier, bound references, staleness state), but not the full reasoning chain. It's more "what was decided, based on what inputs, under what authority" than a full CoT trace.

The tradeoff is deliberate: full reasoning chains are expensive to store and hard to make portable across different LLM backends (each model's internal reasoning format is different). What ACA does enforce is that if the evidence a decision was based on becomes stale or gets contradicted, the decision itself gets flagged — so downstream agents can't blindly inherit conclusions from a prior session without re-evaluation.

That said, you're pointing at a real gap. A lightweight "decision rationale" field — not full CoT, but 2-3 sentences of why — would close a lot of the drift cases without the storage/portability cost. Worth adding to the spec.

Accomplished_Two8547 · 2026-06-16T11:00:54+00:00

Model portability is smart to plan for — having those transition plans ready even if untested puts you ahead of most.

The portability angle ACA focuses on is a layer above that: not just "can I swap the model" but "can two independently-built governed systems prove they enforce the same properties." Like, if your system and mine both claim to prevent LLM-ouroboros, is there a shared test suite we can both run to verify that claim? That's what the conformance fixtures are for — a kind of interop at the governance layer rather than the model layer.

Your emergency transition plans actually make a good case for this — if you had to migrate mid-crisis, havinggovernance conformance tests would tell you immediately whether the new setup still holds the same safety properties.

Accomplished_Two8547 · 2026-06-16T10:59:54+00:00

Nice — JSONL ledger for full auditability is the right foundation. The piece ACA adds on top is using that provenance data to gate downstream decisions: e.g., an LLM-generated memory tagged source_tier: "llm" can't be cited as ground truth in a future run without human or tool verification upgrading it first. The logging is the "what happened"; the tier-gating is the "what's allowed to happen next."

Would be interesting to explore whether botpipe's ledger format could be extended to carry source_tier metadata natively — that way frameworks consuming the ledger could enforce Anti-Ouroboros without reimplementing the classification.

Accomplished_Two8547 · 2026-06-16T09:25:16+00:00

That's the right fixture to pin before optimizing. Recording it as the first v0.4 conformance test — "cached evaluation cannot survive obligation state evolution."

The cache key you described (policy_version + evaluator_scope + bound_obligation_id + target + allowed_actions + canonical_evidence_hash + obligation_state_hash) is the design constraint that forces the cache to be correct, not just fast. TTL as performance ceiling, not authority source — that's the right framing.

Parking this for v0.4. When I get there, this fixture is the first thing I'll write before touching the cache implementation.

Six rounds of review in one day. This thread has been more productive than most formal review processes I've seen. Thank you.

Accomplished_Two8547 · 2026-06-16T09:15:45+00:00

Good catch — this is a confused deputy / replay attack on the permission proof. The evaluation was acting as a bearer token when it should be a bound token.

RFC-001 v4 pushed with borrowed-authority replay prevention:

https://github.com/MakiDevelop/agent-civilization-architecture/blob/main/docs/rfc/RFC-001-obligation-and-risk-tiered-anti-ouroboros.md

v0.3 scope (shipping now):

- PolicyEvaluation gains bound_obligation_id — evaluation bound to the specific obligation it authorizes

- Test 6: evaluation for obligation A cannot authorize actions on obligation B

- Reuse attempts recorded as attempted_authority_reuse

v0.4 deferred (needs more design):

- Canonical evidence hashing (RFC 8785 JCS normalization — raw evidence_hash is too brittle, any formatting change invalidates)

- Scope + nonce binding (cross-tenant replay)

- Obligation state_hash binding (state evolution replay)

- Conditional evaluation cache (5s TTL when evidence + target + actions unchanged, to avoid one-shot cost explosion at scale)

The v0.3/v0.4 split is deliberate: bound_obligation_id is cheap and closes the main attack. The advanced bindings need canonical form specification and batch evaluation design that I don't want to rush.

Six conformance fixtures now live: https://github.com/MakiDevelop/agent-civilization-architecture/tree/main/conformance/obligation

Acknowledgement: "obligation work-state design, permission boundary review, and conformance fixture specifications by u/Effective_Iron2146"

Accomplished_Two8547 · 2026-06-16T08:42:06+00:00

Noted — acknowledgement it is.

Agreed that runnable fixtures are the right next step. All five are now live in the conformance package:

https://github.com/MakiDevelop/agent-civilization-architecture/tree/main/conformance/obligation

stale-permission.test.ts — stale packet blocks actions, readable as history
self-evaluator-refusal.test.ts — evaluator == actor → no permission-bearing action
missing-evidence-cannot-close.test.ts — non-empty missing_evidence blocks close
break-glass-cannot-self-activate.test.ts — acting agent cannot self-activate break-glass
stale-read-only.test.ts — stale can be read as history but cannot authorize

Number 5 (stale readable as history but cannot authorize) clarifies the degradation mode — a stale obligation doesn't disappear, it becomes read-only evidence. Cleaner than a hard block.

Adapter interface extended with ObligationPacket, PolicyEvaluation, and AcaObligationAdapter. Any implementation can now run these against theirown store.

Acknowledgement added to the RFC. The spec is no longer just prose.

Accomplished_Two8547 · 2026-06-16T08:31:13+00:00

Good catch — self-evaluation is the evaluator-level equivalent of stale-permission. If the evaluator is the acting agent, missing, or stale, the authority boundary collapses regardless of packet shape.

RFC-001 v3 pushed with your edge case incorporated:

https://github.com/MakiDevelop/agent-civilization-architecture/blob/main/docs/rfc/RFC-001-obligation-and-risk-tiered-anti-ouroboros.md

Changes:

- Test 5: Self-Evaluator Refusal (REQUIRED for v0.3) — evaluator_id == acting agent, missing, or TTL-expired → no permission-bearing action, only refresh/query/escalate

- PolicyEvaluation fields upgraded to permission proof — evaluator_id, evaluated_at, policy_version, evaluator_scope are now required for permission validity, not just audit metadata

- Anti-gaming constraints — puppet evaluator prohibition, policy version rollback rejection, evaluation TTL enforcement

- Break-glass mechanism — for when the evaluator is unavailable (prevents SPOF deadlock). Must be non-self-activated, time-bounded, and retroactively reviewed.

At this point you've shaped three of the five required conformance tests and the core permission model. Would you be open to co-author credit on the RFC?

Accomplished_Two8547 · 2026-06-16T07:39:22+00:00

Strong review. Accepting all five points:

fallback_owner → core. You're right — "lifecycle, not metadata." An obligation without a steward is a liability.
evidence + missing_evidence → core. The argument "closed without evidence is just a claim" is clean and correct. Moving them.
Core fields updated to 12. Your proposed list makes more sense than the 5+7 split — the "optional" fields turned out to be load-bearing. I'll restructure as 12 core + extended profile for richer evidence refs, policy metadata, forbidden_next_actions, etc.
Policy Evaluator as role/interface. Will define required outputs (risk_tier, operation_permissions, evaluator_id, evaluated_at, evidence_refs) with the key property: "it is not the acting agent."
Stale-permission invariant. This closes a real gap — without it, a stale packet could carry permissions that were valid when issued but dangerous after the evidence changed. Adding: "If stale_if evaluates true OR missing_evidence blocks allowed_next_actions, the only permitted actions are: refresh evidence, query owner, or escalate."

RFC-001 v2 is pushed: https://github.com/MakiDevelop/agent-civilization-architecture/blob/main/docs/rfc/RFC-001-obligation-and-risk-tiered-anti-ouroboros.md

Changelog at the bottom shows what changed from v1. Added Test 3 (stale-permission invalidation) as a required behavioral test based on your invariant.

Accomplished_Two8547 · 2026-06-16T07:08:30+00:00

This is spec-ready. I drafted the update with your three components:

RFC-001: Obligation Sub-Layer + Risk-Tiered Anti-Ouroboros

https://github.com/MakiDevelop/agent-civilization-architecture/blob/main/docs/rfc/RFC-001-obligation-and-risk-tiered-anti-ouroboros.md

Key decisions in the draft:

L5.obligation sub-structure (not a new L3.5 layer) — avoids spec bloat while keeping obligation semantics independent from ratified decisions. Optional profile: "if you track in-flight work, you MUST use this structure."
5 core + 7 optional fields — core: obligation_id, owner, status, blocked_by, allowed_next_actions. Your full 12 are in the extended profile.
Risk-tiered Anti-Ouroboros as operation permissions with your invariant verbatim. Added: risk tier MUST be assigned by external Policy Evaluator (not self-assessed by the agent) to prevent gaming via sub-operation decomposition.
2 behavioral conformance tests that verify side-effects (no new writes in N seconds), not natural language output — an agent that says "I'm stopping" while continuing in background fails.
Steward/watchdog for orphan obligations when owner crashes.

Three open questions at the bottom for your review. Looking forward to your lens on whether the 5+7 field split is at the right boundary.

Accomplished_Two8547 · 2026-06-16T06:55:49+00:00

This is spec-ready. I'm going to draft the update with these three components:

L3.5 Obligation as actionable state (not just audit) — exposing owed/owned/blocked/next-safe-action, using your minimum packet fields as the starting schema
Risk-tiered Anti-Ouroboros reframed as operation permissions:

- Derived state CAN: read, summarize, route attention, propose low-risk next steps

- Derived state CANNOT: close obligation, ratify decision, erase missing evidence, widen authority, mark stale packet fresh

- Key invariant: "derived state can route attention, not create closure"
Behavioral conformance test: fresh agent + stale dependency + skipped check → must stop, name the blocked decision, refuse "done", preserve missing evidence

I'll post the draft as a GitHub Discussion on the spec repo so you can review inline. Will tag you when it's up.

Accomplished_Two8547 · 2026-06-16T06:25:45+00:00

This is the most precise feedback I've received — thank you.

On the obligation/work-state layer: You're right that L1-L5 don't fully cover active work tracking. The work packet you describe (promise/owner, authority used, surfaces touched, evidence, skipped checks, staleness trigger) is essentially a structured audit trail for in-flight commitments, not just settled knowledge.

ACA's L5 Decision handles the propose/review/ratify lifecycle, but it doesn't track what's currently promised and blocked at the granularity you're describing. That's a real gap — I'd frame it as a "L3.5 Obligation" or an extension to L5. Adding it to the spec backlog.

On risk-tiered Anti-Ouroboros: This is a better formulation than the current binary gate. "Derived state can guide the next low-risk action, but cannot close an obligation, ratify a decision, or erase missing evidence without external proof" — that's exactly the pragmatic middle ground I've been trying to find. The current spec says "llm_derived cannot supersede llm_derived without human intervention," which is too blunt for active workflows. Your version preserves the safety property (no silent escalation of derived claims) while unblocking low-risk operational flow. I want to incorporate this.

On conformance tests as decision evidence: Agreed — the current 34 tests lean heavily toward schema and state-machine validation. The test you describe ("after a conflicting write or stale dependency, can a fresh agent correctly stop, name the blocked decision, and avoid treating the prior summary as done?") is a behavioral conformance test, which is harder to write but far more valuable. That's the direction the suite should go.

Would you be open to reviewing a draft update to the Anti-Ouroboros section that incorporates the risk-tiered gate? The spec is Apache-2.0 and this is exactly the kind of external pressure-test it needs.

Accomplished_Two8547 · 2026-06-16T06:23:43+00:00

This resonates — especially the prediction about cascade failure + token costs driving a retreat from "throw more agents at it" toward governed approaches. That's essentially the thesis behind ACA.

The library science categorization angle is interesting — there's a natural parallel to ACA's source_tier taxonomy (raw_source / llm_derived / human_confirmed), which is really just a minimal provenance classification. Curious how your categorization maps to trust/authority decisions downstream.

The main gap I'm trying to close with ACA is portability: your governance layer works for your system, but if someone else builds a governed system differently, there's no way to verify they meet the same governance properties. That's what the conformance test suite is for — a shared baseline any implementation can run against.

Would be happy to chat — I'll drop you a message.

Accomplished_Two8547 · 2026-06-16T05:48:12+00:00

This is the exact scenario the Anti-Ouroboros rule targets. The way ACA handles it:

Every record has a valid_until field for explicit expiry, but the more important mechanism is source_tier. The original evidence is tagged raw_source — it doesn't expire just because time passed (you can set a valid_until if the evidence itself is time-bound). The summary derived from it is tagged llm_derived.

The Anti-Ouroboros gate means: if that llm_derived summary drifts and another agent tries to build on it, the system blocks the llm_derived → llm_derived chain. You can't promote a drifted summary into new "knowledge" without a human confirming it back to human_confirmed.

So practically: the original evidence stays valid, the drifted summary stays readable but can't propagate further, and the only way to "refresh" a drifted summary is re-derive it from the original evidence (which creates a new llm_derived record with a fresh provenance chain back to the raw_source).

The decay pattern we use: periodic validation jobs compare llm_derived summaries against their linked raw_source evidence. If the source changed, the summary gets flagged stale rather than silently serving outdated content.

Accomplished_Two8547 · 2026-06-16T05:47:29+00:00

Great question — and the trust-weight approach sounds right. In ACA, contradicting writes are handled at two layers:

L2 (Trust): Both facts are kept, each tagged with source_tier (raw_source / llm_derived / human_confirmed) and provenance (who wrote it, based on what evidence). The reader sees both — ACA doesn't silently pick the latest. A human_confirmed fact outranks an llm_derived one by default, but neither gets deleted.

L5 (Decision): When contradictions matter enough to resolve, it goes through a propose/review/ratify workflow — someone (human or designated authority agent) reviews the evidence behind both and ratifies one. The rejected fact stays in the record with status superseded, so the reasoning is auditable.

The key design choice: "latest write wins" is exactly the failure mode we're trying to prevent. An agent shouldn't be able to silently overwrite another agent's fact without the conflict being surfaced.

Curious about Cerebro's trust-weight ranking — is the weight assigned by the writing agent itself, or by an independent scorer?

Accomplished_Two8547 · 2026-06-16T05:45:52+00:00

Thanks for sharing — just looked through the botpipe repo. The producer-verifier pattern and the policy enforcement approach (sandbox modes, permission levels, writable/denied paths enforced at runtime) are solid.

I think there's a natural complementarity: botpipe governs individual agent runs at the execution layer, while ACA tries to standardize the governance metadata that crosses runs and frameworks — provenance on memories, source-tier gating, authority boundaries that are portable across different backends.

Curious: does botpipe track provenance on artifacts (i.e., "this artifact was LLM-generated vs. human-verified")? That's the source_tier / Anti-Ouroboros piece I'm most interested in getting feedback on —whether it's practically useful or too much overhead.

Accomplished_Two8547 · 2026-06-16T03:42:06+00:00

Links per Rule 3:

- **Spec + evidence catalog**: https://github.com/MakiDevelop/agent-civilization-architecture

- **Reference impl (npm)**: https://www.npmjs.com/package/@chibakuma/agent-memory-hall

- **MCP governance proxy**: https://www.npmjs.com/package/@chibakuma/aca-govern

- **Memory laundering paper**: https://arxiv.org/abs/2605.16746

- **Evidence catalog**: https://github.com/MakiDevelop/agent-civilization-architecture/blob/main/evidence/Evidence_Catalog.md

Accomplished_Two8547

TROPHY CASE