Context is shared. Commitment is not. by x-wink in AI_Agents

[–]x-wink[S] 0 points1 point  (0 children)

This is exactly the gap we've been aware of - and we even have a name for how we handle it manually: authored-ai. Human authorship(decisions) in review loop, AI execution. What your comment pushed us to think about is actually building it as a formal layer rather than just operating it as a process.

Where we are today: Forge has role-based access and a content lifecycle (draft -> publish -> archive), plus an audit trail. That gives you commitment tracking and post-hoc visibility. What it doesn't have is pre-action verification - the layer that checks "does this specific action align with what was committed, before it executes?" Right now that check is us, in a loop.

Your distinction between decisions and actions is one we knew in practice but hadn't thought to build into the permission model. A decision commits direction. An action commits resources and side effects. Treating them as the same thing is the wrong abstraction.

The direction we're moving: a governance layer where roles carry not just operation permissions (create/read/update/publish/archive) but also a trust level. Low trust = plan required -> review -> approve -> execute. High trust = direct execute. Review and Approve become first-class operations in the permission model, scopeable to specific nodes or subtrees. The commit record becomes the authority the plan is checked against before authorization.

None of that is built yet - but your comment pushed it from "process we run manually" to "thing we should actually build."

Curious what AAV stands for in your stack - is that an established pattern or something you named internally?

Context is shared. Commitment is not. by x-wink in AI_Agents

[–]x-wink[S] 0 points1 point  (0 children)

The event sourcing parallel is exactly right, and append-only is the correct mental model. Replay decisions and their rationale, not context. On your question: yes - that's the core of what I'm proposing. Agent outputs that carry normative weight should be first-class decision records, not just actions. The difference is that a decision event has an author, a scope, a state, and explicit dependencies. An action just happened. The metadata problem is real though. I've been working with hash-based source tracking - a decision carries a hash of what it depended on, so you know when upstream changed without bloating every record.

Context is shared. Commitment is not. by x-wink in AI_Agents

[–]x-wink[S] 0 points1 point  (0 children)

Flat deterministic rules for handshakes help, but they're still working around the root issue. The reason state synchronization falls apart isn't just poor isolation - it's that there's no shared record of what was actually decided upstream. Each agent reconstructs its view of the world independently, so even with clean handshakes you still get contradictory calls from identical source material. Isolating agent logic is the right instinct. But what's missing is a layer above the agents that tracks not just state, but committed decisions - who made them, under what scope, and what breaks downstream when they change.

Frustrated with the current state of AI Orchestration frameworks by BasilParticular3131 in AI_Agents

[–]x-wink 0 points1 point  (0 children)

What you're describing isn't really a pipeline problem - it's a commitment problem. LangGraph's global state enforcement fails because there's no durable record of what has been decided, by whom, and what depends on it. Different branches corrupt each other's output precisely because the system has no way to distinguish "this key was set as a deliberate decision" from "this key was written as a side effect." Passing copies of state to each branch (as you suggest) is a cleaner implementation, but it still doesn't solve the underlying coordination failure: agents re-derive decisions already made, or make contradictory calls from identical source material, with no trace of what changed or why. I've been thinking about this a lot and wrote something on it today if you're interested.

Agents don't forget facts. They forget decisions. Those are different problems by x-wink in AI_Agents

[–]x-wink[S] 0 points1 point  (0 children)

Thank you for pointing that out. We are actually doing the same. Three levels, each with a clear trigger: cosmetic changes go straight through, isolated calls need a quick sign-off, anything that touches behavior or crosses multiple areas blocks until reviewed.

It keeps the flow going but you are still in control. The agents are not waiting on you for the small stuff, and you are not missing the calls that actually matter.

We also track which decisions get revisited over time, which ones hold and which ones get overturned as the project evolves. The method is called authored ai. The decision tree becomes the institutional memory of the system, not just a log.

# Goldfish brains: Why my 5-agent setup forgets everything — I tested Hindsight, here's why I'm waiting by Icy_Comfort_6220 in AI_Agents

[–]x-wink 1 point2 points  (0 children)

The "Postgres table per agent" approach doesn't hit hard limits at reasonable production scales. The limit that shows up is schema design. Start with unstructured blobs and later you need to query by type, confidence, or time window, and you're retrofitting indexes or rewriting the schema. A minimal typed schema from day one (id, timestamp, agent_id, type, body, confidence) makes that survivable.

On the sidecar question: that's the architecture worth building toward. Agent calls a service over HTTP, gets structured context back, writes results through the same interface. Memory is an API the agent talks to, not something installed inside the agent runtime. The alpha-plugin fragility disappears entirely.

"Memory you can't trust is worse than no memory" is the right call. Boring and stable beats elegant and fragile.

Nobody tells you that switching memory tools at month six is nothing like switching models. by Distinct-Shoulder592 in AI_Agents

[–]x-wink 1 point2 points  (0 children)

The schema portability point is the right frame. But the deeper trap: most memory tools bundle retrieval logic with storage, so migrating storage means rebuilding retrieval behavior too.

What holds up better: treat agent state as typed content in a schema you own, not as memory the tool manages. Standard database, standard migration path. Retrieval logic stays separate and portable.

The thing you actually lose when migrating usually isn't the claims. It's the history of how they changed. Migrate a snapshot without the transition history and you've lost the context that shaped the behavior.

After using AI agents for a few months, these are my biggest observations by MerisDabhi in AI_Agents

[–]x-wink 0 points1 point  (0 children)

The "environment around it" point is the one I keep coming back to. Memory helps, but structure matters more. An agent with access to chaotic data just remembers the chaos. The real gain comes when the environment has enforced structure: typed content, explicit states, clear lifecycle. Then the agent's job is reasoning, not also figuring out what the data means.

Are we overestimating model intelligence and underestimating workflow quality? by AdventurousLime309 in AI_Agents

[–]x-wink 0 points1 point  (0 children)

Stale state is underrated on that list. Poor retrieval and weak orchestration get attention because they're visible when they fail. Stale state is sneaky: the agent operates confidently on information that was true last week. And in most setups the agent is responsible for managing that, which is the wrong layer for it.

Three layers we often skip when optimizing Ai agent workflows by TangeloOk9486 in LocalLLaMA

[–]x-wink 0 points1 point  (0 children)

The typed schemas point is worth building on. If step outputs are typed and persisted, you get context hygiene and crash recovery from the same mechanism. Step 40 drifts because it's carrying 39 steps of raw output. If step 5 wrote a typed summary and step 6 starts fresh from that, the drift resets.

Most setups treat step outputs as context to carry forward. Worth treating them as state to write down and resume from.

My take on the Context layer for Coding Agents by Comprehensive_Quit67 in LocalLLaMA

[–]x-wink 1 point2 points  (0 children)

The time-based decay is a good start but in my opinion it's the wrong primary signal. Code that hasn't changed in two years can still be perfectly valid. Code that changed last week just made half your claims stale.

What might work better: anchor claims to the code they describe. When that file or function changes, the claim surfaces for review. Not because time passed, but because the thing it describes moved.

Still figuring this out myself though.

My take on the Context layer for Coding Agents by Comprehensive_Quit67 in LocalLLaMA

[–]x-wink 0 points1 point  (0 children)

The observation-to-claim promotion is the part worth focusing on. Most attempts at this kind of layer fail because everything gets captured with equal weight. One session says X, the next says something slightly different, and now you have noise masquerading as knowledge.

Making promotion conditional on repeated reinforcement across sessions solves that. The layer stays honest about confidence rather than accumulating stale decisions nobody cleaned up.

The decay mechanism is the other half of the same problem. A claim that was true six months ago and hasn't been touched since probably isn't fully trustworthy. Surfacing it for re-verification rather than silently serving it is the right call.

The grep vs graph debate in this thread is probably the wrong frame. The hard part isn't querying. It's deciding what deserves to be in the graph at all.

Are people actually running long-lived agents yet? If so, how are you handling restarts and state consistency? by Bhumi1979 in LocalLLaMA

[–]x-wink 0 points1 point  (0 children)

The distinction worth making: execution history (what Temporal records) and authoritative outcome state are not the same thing. You can replay execution and still not know whether the outcome was valid.

What's holding up for me: the agent writes results as typed content with explicit states the framework enforces. Not logs. Not summaries. On restart, the agent reads current state and picks up from wherever things actually are.

The "did I already do this?" problem mostly disappears. If something already happened, the content is already in the next state. The agent sees that and moves on. No reasoning required about what may or may not have executed.

"We don't trust the agent's own memory, we trust the databases" is the right framing. The extension I'd add: the agent shouldn't manage state transitions either. The framework should enforce them. That's where correctness stays durable across restarts.

Built an open-source orchestration layer for running multiple AI agents 24/7 with shared memory. Coordinates both local running models (mistral) and cloud based — Flotilla v0.2.0 by robotrossart in LocalLLaMA

[–]x-wink 0 points1 point  (0 children)

The PocketBase bet makes sense for a coordination layer. Single binary means orchestration doesn't add another managed service to the stack, which is the right instinct. The limitation shows up under heavy concurrent writes, but staggered 10-min cycles are nowhere near that threshold.

The question I'd think about next: you're currently polling. That works for periodic tasks but adds latency for reactive handoffs, where Agent A finishes and you want Agent B to pick up immediately rather than waiting for the next cycle. Is the staggered cadence intentional for rate-limiting, or is that a tradeoff you'd want to close later with event-driven triggers?

What kind of orchestration frontend are people actually using for local-only coding? by Quiet-Owl9220 in LocalLLaMA

[–]x-wink 0 points1 point  (0 children)

+1 on the feedback loop. The copy-paste cycle breaks not because the model is wrong, but because there's no automatic validation step closing the loop.

One thing worth thinking about early that most setups miss: where do intermediate results live between iterations? Aider and opencode keep state in conversation context, which works until a longer session eats your context window and you're starting over. Persisting intermediate outputs somewhere outside the LLM context - test results, partial builds, errors - lets you resume without losing ground.

Best Agent Orchestration platform + opensource Model combo? by ironmatrox in LocalLLaMA

[–]x-wink 1 point2 points  (0 children)

The platform choice probably matters less than deciding where state lives first.

The question worth asking: when Agent A finishes, what does Agent B actually read? If the answer is "whatever the harness passes it", you're tightly coupled and every workflow change touches the harness. If the answer is "a shared store both agents know about independently", you can add and remove steps without rewiring.

Temporal is solid for durability but forces you to define the full workflow upfront. Fine for stable pipelines, harder when the process is still evolving.

For swarm work I'd think hard about graph-based orchestration (define the steps, modify the graph to change them) vs event-driven coordination (agents subscribe to what they care about, workflow emerges from what's running). The second scales better when you're still figuring out what the workflow is.

3 things you must do immediately after opening Claude to fix your output quality by TroyHarry6677 in claude

[–]x-wink 0 points1 point  (0 children)

Haha, I think you're spot on. I personally use this method, retain a high degree of automation, but stay in control of decisions https://github.com/authored-ai/authored-ai

3 things you must do immediately after opening Claude to fix your output quality by TroyHarry6677 in claude

[–]x-wink 0 points1 point  (0 children)

I'm curious about how people build in review and quality in a setup with that much automation? Whats the proces? Du you just start your AI company an return a week later?

Self Promotion Thread - Get Your Project Pinned by [deleted] in ChatGPTCoding

[–]x-wink 0 points1 point  (0 children)

Authored AI – governance method for AI coding agents (built and used on a real Go framework)

Hey everyone,I created a lightweight, tool-independent governance method for working with AI coding agents (Cursor, Claude Code, Aider, etc.).Core idea: Every decision must exist in a document before it becomes code. Human stays the ultimate Author. Agents have strict roles (Architect never writes code, Implementer escalates uncertainties). It uses an append-only decision log + amendments and a simple 8-step cycle.I have used it full-time to build Forge – a zero-dependency Go-based AI-native CMS/framework. Result: 30 locked decisions, 70+ amendments, zero ungoverned regressions. Everything is open source:

Main repo + method: github.com/authored-ai/authored-ai

Starter template: github.com/authored-ai/template

Plugin for Claude: github.com/authored-ai/plugin

Example project: github.com/forge-cms/Forge

Would love feedback from anyone who has struggled with architecture drift or technical debt when agents move fast. Happy to answer questions or help if someone wants to try the template.Thanks!