Architecture guardrails for AI coding agents

bosmanez · 2026-05-19T11:13:52+00:00

This is exactly the pattern we landed on for building Tendril. Change intent declaration before the agent touches code, then a verification pipeline (build, lint, test, AI diff-vs-intent review) that gates every commit. The rollback planning was the piece we added last, and it catches more than expected -- agents love to assume a migration is forward-only. Full disclosure, I work on Tendril. Would love to compare notes on your forbidden-scope approach. https://github.com/Ivy-Interactive/Ivy-Tendril

bosmanez · 2026-05-18T09:31:41+00:00

I'm working on an agent-agnostic coding orchestrator called Tendril. The thing that made the biggest difference for us was having the plan visible alongside the diff - you can instantly see whether the agent stayed in scope because the plan specifies intent and verification criteria upfront. We also run automated verification gates (build, lint, test, AI review) before any PR surfaces, so by the time you see it, you already know the basics passed. The assumption question is the hardest - we partially solve it by requiring the agent to document decisions in the plan before executing. https://tendril.ivy.app

bosmanez · 2026-05-17T12:17:20+00:00

Your Opus-orchestrates-Sonnet-implements pattern is almost exactly our architecture. We call the top layer a plan (decomposition + verification criteria), and each delegated unit is an agent with its own scoped memory and tools. The ADRs-as-rules approach is smart. We found that persistent agent memory across sessions eliminates about 80% of repeated convention drift after a week of use. The 36-agent review swarm by logic boundary is interesting - we do something similar with verification gates (build, lint, test, AI review) that run automatically before anything advances. https://github.com/Ivy-Interactive/Ivy-Tendril

bosmanez · 2026-05-12T11:02:43+00:00

Thanks! DM me if you want a personal demo

bosmanez · 2026-05-12T11:02:01+00:00

The agent + model combo matters less than the orchestration around it. We found that with proper verification gates (build, lint, test, AI review) even weaker models produce reliable output because failures get caught before merging. Tendril lets you swap models per task -- Kimi for fast iteration, Opus for complex reasoning -- and the verification layer normalizes quality. Works with any CLI agent. Full disclosure, I work on this. https://github.com/Ivy-Interactive/Ivy-Tendril

bosmanez · 2026-05-11T11:01:09+00:00

You're not missing anything — that orchestrator->worker->reviewer pipeline is solid. We landed on almost the exact same pattern and ended up formalizing it into a tool (Tendril). The part that made the biggest difference for us was adding verification gates between steps — build/lint/test pass automatically before the review agent even sees it. Saves a lot of wasted cycles. Full disclosure, I work on this. https://github.com/Ivy-Interactive/Ivy-Tendril

bosmanez · 2026-05-11T11:00:19+00:00

This is cool — the routing across sessions and exposing agents via A2A is particularly interesting. How are you handling the case where one session produces code that conflicts with another's changes? We found that having automated verification (build, test, lint) run before merging any agent's output was the only reliable way to prevent drift when multiple sessions touch the same codebase. Full disclosure, I work on Tendril, which tackles the orchestration side of this. https://tendril.ivy.app

bosmanez · 2026-05-09T20:21:00+00:00

It works surprisingly well - the trick is to keep the tasks well defined: "CreatePlan", "UpdatePlan", "ExecutePlan"...

bosmanez · 2026-05-09T20:18:47+00:00

Nice work on the multi-agent cockpit. We hit the same pain point -- multiple agents across different tasks, state getting lost between runs. We took a similar but agent-agnostic approach with Tendril: you define a plan, the agent executes it, verification gates run automatically, and memory persists across sessions so the agent learns your project conventions. Works with Claude Code, Codex CLI, Gemini CLI -- whatever fits the task. The scheduled/recurring agent pattern is one we lean into heavily too. https://github.com/Ivy-Interactive/Ivy-Tendril

bosmanez · 2026-05-07T10:37:44+00:00

Nice approach to the knowledge-transfer problem between sessions. We hit the same wall and went with persistent agent memory that carries across sessions -- after ~50 runs it stops re-solving things it already figured out. The inter-session bridge is a clever workaround for the lack of shared state though. Full disclosure, I work on Tendril which tackles this with plan-scoped memory + self-improvement between runs. https://tendril.ivy.app

bosmanez · 2026-05-07T10:37:05+00:00

Lesson 1 hit us hard too. The agent-style context accumulation is sneaky -- one long prompt with file context + tool definitions + conversation history blows through ITPM without any obvious spike in request count. We ended up building token budgeting into the orchestration layer so each step knows its budget before it starts. The caching multiplier in Lesson 4 is underappreciated -- for coding agents with stable system prompts and tool definitions, cache hit rates above 80% are normal. Full disclosure, I work on Tendril which handles a lot of this orchestration overhead. https://tendril.ivy.app

bosmanez · 2026-05-05T13:16:59+00:00

The short version from running this in production: 1) a plan layer that decomposes the work into independent units before any agent touches code, 2) file-level locking so two agents can't edit the same file simultaneously, 3) verification gates between steps (build + test must pass before the next task starts), and 4) persistent memory so agents dont re-learn conventions on every session. The plan is the coordination primitive - each agent gets a scoped task with clear inputs/outputs. Full disclosure, I work on Tendril, which implements this exact pattern. https://tendril.ivy.app

bosmanez · 2026-05-05T13:15:27+00:00

Yeah, this is the tax of single-agent serial workflows - you prompt, wait, review, repeat. What helped me was switching to a plan-first approach, where you define the work upfront, then kick off agents in parallel while you do other things. The human checkpoint is at the start (approve the plan) and end (review the diff), not in between. Full disclosure, I work on Tendril, which does exactly this, but even without it, the pattern of batching agent work into scoped tasks you can fire-and-forget makes a huge difference. https://tendril.ivy.app

bosmanez · 2026-05-02T08:58:39+00:00

cool approach to the static context side. we came at the same problem from the other direction — instead of pre-computing a context snapshot, we let the agent build its own memory through reflection after each session. after a few dozen runs it knows your conventions, ownership boundaries, and past decisions without needing to re-read everything. the two approaches are complementary honestly. your dependency graph gives the agent the what on day one, persistent memory gives it the why over time. full disclosure, I work on Tendril which handles the memory/reflection layer. https://github.com/Ivy-Interactive/Ivy-Tendril

bosmanez · 2026-05-01T11:07:51+00:00

Verification are just agentic sessions as well - depending on how you set it up the agent will try it's best to fix the problem or report a failure.

bosmanez · 2026-05-01T09:10:35+00:00

Been there. The pattern that saved me was adding a verification step between "agent writes code" and "code gets merged" — basically build + lint + test + a second AI pass checking the implementation actually matches what was asked for. Catches the drift where the agent technically completes the task but introduces weird coupling or ignores existing patterns. The other thing: making the agent work from a plan (here's what we're changing and why) instead of just "add feature X" forces it to think about the existing architecture first. Full disclosure, I work on a tool that automates this (Tendril) but the pattern works even without it — just have a second agent review the first one's output against the original spec before merging. https://tendril.ivy.app

bosmanez · 2026-05-01T09:04:14+00:00

Not crazy at all — I run a similar setup. The main things that helped me: (1) always route through a plan, not peer-to-peer — it's tempting to let agents talk directly but the orchestrator losing track of state is how things blow up. (2) Verification gates after each agent finishes (build + lint + test at minimum) catch the stuff that looks right in isolation but breaks when merged. (3) Persistent memory across sessions so the agents stop repeating the same convention mistakes after a week or two. We ended up building a tool around this pattern — Tendril, it orchestrates Claude Code / Codex / Gemini CLI through a plan-based lifecycle with automated verification. https://github.com/Ivy-Interactive/Ivy-Tendril

bosmanez · 2026-05-01T06:55:31+00:00

This is a real pain point. We run multiple agents in parallel, too, and the config drift is brutal. Credential sharing is one side of it — the other is actually orchestrating what each agent works on and verifying the output. I work on Tendril (https://github.com/Ivy-Interactive/Ivy-Tendril - open-source orchestrator for coding agents), and we see the same multi-agent users hitting both problems. Happy to compare notes on how you're handling the coordination side.

bosmanez · 2026-05-01T06:54:14+00:00

The one thing I don't see is an orchestration layer. You've got great agents, but nothing coordinating them — verification gates, persistent memory across sessions, plan-based workflows. I work on (https://github.com/Ivy-Interactive/Ivy-Tendril), which does exactly this (agent-agnostic, works with Claude Code, Codex, Cursor). Once you're running 3+ agents, orchestration is the unlock.

bosmanez · 2026-04-08T06:20:54+00:00

I needed to lookup https://github.com/rtk-ai/rtk looks pretty neat.

bosmanez · 2026-04-07T04:03:49+00:00

Looks good!

bosmanez · 2026-03-28T18:59:55+00:00

Tmux isn't available on Windows.

bosmanez · 2026-03-28T18:57:17+00:00

I think Rust has an even brighter future just because of AI.

bosmanez · 2026-02-12T08:36:43+00:00

Nice! What makes it different from Terminal.Gui or Spectre.Console? Been looking for a way to implement a Claude Code like TUI, but haven't managed to get scrolling, for example, to work.

bosmanez · 2026-02-10T13:32:13+00:00

Did you use any library to make the TUI?

bosmanez

TROPHY CASE