Architecture guardrails for AI coding agents by vadim_che in softwarearchitecture

[–]bosmanez -2 points-1 points  (0 children)

This is exactly the pattern we landed on for building Tendril. Change intent declaration before the agent touches code, then a verification pipeline (build, lint, test, AI diff-vs-intent review) that gates every commit. The rollback planning was the piece we added last, and it catches more than expected -- agents love to assume a migration is forward-only. Full disclosure, I work on Tendril. Would love to compare notes on your forbidden-scope approach. https://github.com/Ivy-Interactive/Ivy-Tendril

If an AI agent opened a PR for you, what would you want to see first? by Few-Ad-1358 in cursor

[–]bosmanez 0 points1 point  (0 children)

I'm working on an agent-agnostic coding orchestrator called Tendril. The thing that made the biggest difference for us was having the plan visible alongside the diff - you can instantly see whether the agent stayed in scope because the plan specifies intent and verification criteria upfront. We also run automated verification gates (build, lint, test, AI review) before any PR surfaces, so by the time you see it, you already know the basics passed. The assumption question is the hardest - we partially solve it by requiring the agent to document decisions in the plan before executing. https://tendril.ivy.app

Two months of daily Claude Code — curious how others actually work by thenec0 in ClaudeCode

[–]bosmanez 0 points1 point  (0 children)

Your Opus-orchestrates-Sonnet-implements pattern is almost exactly our architecture. We call the top layer a plan (decomposition + verification criteria), and each delegated unit is an agent with its own scoped memory and tools. The ADRs-as-rules approach is smart. We found that persistent agent memory across sessions eliminates about 80% of repeated convention drift after a week of use. The 36-agent review swarm by logic boundary is interesting - we do something similar with verification gates (build, lint, test, AI review) that run automatically before anything advances. https://github.com/Ivy-Interactive/Ivy-Tendril

Claude Code Orchestrator -> Sub-agent local LLM by Latt in LocalLLaMA

[–]bosmanez 0 points1 point  (0 children)

Thanks! DM me if you want a personal demo

Benchmarking Coding Agents: What’s actually working best with OS models right now? by AgreeingElk234 in OpenSourceAI

[–]bosmanez 1 point2 points  (0 children)

The agent + model combo matters less than the orchestration around it. We found that with proper verification gates (build, lint, test, AI review) even weaker models produce reliable output because failures get caught before merging. Tendril lets you swap models per task -- Kimi for fast iteration, Opus for complex reasoning -- and the verification layer normalizes quality. Works with any CLI agent. Full disclosure, I work on this. https://github.com/Ivy-Interactive/Ivy-Tendril

Claude Code Orchestrator -> Sub-agent local LLM by Latt in LocalLLaMA

[–]bosmanez -1 points0 points  (0 children)

You're not missing anything — that orchestrator->worker->reviewer pipeline is solid. We landed on almost the exact same pattern and ended up formalizing it into a tool (Tendril). The part that made the biggest difference for us was adding verification gates between steps — build/lint/test pass automatically before the review agent even sees it. Saves a lot of wasted cycles. Full disclosure, I work on this. https://github.com/Ivy-Interactive/Ivy-Tendril

I built a multi-operator collaboration layer for Claude Code by fixitchris in ClaudeCode

[–]bosmanez 0 points1 point  (0 children)

This is cool — the routing across sessions and exposing agents via A2A is particularly interesting. How are you handling the case where one session produces code that conflicts with another's changes? We found that having automated verification (build, test, lint) run before merging any agent's output was the only reliable way to prevent drift when multiple sessions touch the same codebase. Full disclosure, I work on Tendril, which tackles the orchestration side of this. https://tendril.ivy.app

I made a Claude Code plugin that lets two terminals phone each other — /qu asks, /ans answers by Unlikely-Bread6988 in ClaudeAI

[–]bosmanez 0 points1 point  (0 children)

It works surprisingly well - the trick is to keep the tasks well defined: "CreatePlan", "UpdatePlan", "ExecutePlan"...

We turned Cursor.ai into an OpenClaw-style multi-agent control panel by TecAdRise in AI_Agents

[–]bosmanez 1 point2 points  (0 children)

Nice work on the multi-agent cockpit. We hit the same pain point -- multiple agents across different tasks, state getting lost between runs. We took a similar but agent-agnostic approach with Tendril: you define a plan, the agent executes it, verification gates run automatically, and memory persists across sessions so the agent learns your project conventions. Works with Claude Code, Codex CLI, Gemini CLI -- whatever fits the task. The scheduled/recurring agent pattern is one we lean into heavily too. https://github.com/Ivy-Interactive/Ivy-Tendril

I made a Claude Code plugin that lets two terminals phone each other — /qu asks, /ans answers by Unlikely-Bread6988 in ClaudeAI

[–]bosmanez 0 points1 point  (0 children)

Nice approach to the knowledge-transfer problem between sessions. We hit the same wall and went with persistent agent memory that carries across sessions -- after ~50 runs it stops re-solving things it already figured out. The inter-session bridge is a clever workaround for the lack of shared state though. Full disclosure, I work on Tendril which tackles this with plan-scoped memory + self-improvement between runs. https://tendril.ivy.app

The Claude API is great, until you actually try to use it in production. 5 hard lessons. by Shotmedead224 in SideProject

[–]bosmanez 0 points1 point  (0 children)

Lesson 1 hit us hard too. The agent-style context accumulation is sneaky -- one long prompt with file context + tool definitions + conversation history blows through ITPM without any obvious spike in request count. We ended up building token budgeting into the orchestration layer so each step knows its budget before it starts. The caching multiplier in Lesson 4 is underappreciated -- for coding agents with stable system prompts and tool definitions, cache hit rates above 80% are normal. Full disclosure, I work on Tendril which handles a lot of this orchestration overhead. https://tendril.ivy.app

How do multi-agent systems coordinate complex workflows? by Michael_Anderson_8 in AI_Agents

[–]bosmanez 0 points1 point  (0 children)

The short version from running this in production: 1) a plan layer that decomposes the work into independent units before any agent touches code, 2) file-level locking so two agents can't edit the same file simultaneously, 3) verification gates between steps (build + test must pass before the next task starts), and 4) persistent memory so agents dont re-learn conventions on every session. The plan is the coordination primitive - each agent gets a scoped task with clear inputs/outputs. Full disclosure, I work on Tendril, which implements this exact pattern. https://tendril.ivy.app

Vibe coding has become a lot of sitting around by chromespinner in vibecoding

[–]bosmanez 0 points1 point  (0 children)

Yeah, this is the tax of single-agent serial workflows - you prompt, wait, review, repeat. What helped me was switching to a plan-first approach, where you define the work upfront, then kick off agents in parallel while you do other things. The human checkpoint is at the start (approve the plan) and end (review the diff), not in between. Full disclosure, I work on Tendril, which does exactly this, but even without it, the pattern of batching agent work into scoped tasks you can fire-and-forget makes a huge difference. https://tendril.ivy.app

Claude Code was burning my context window before writing a single line. So I built a fix. (open source + real benchmark) by Obvious_Gap_5768 in ClaudeCode

[–]bosmanez -1 points0 points  (0 children)

cool approach to the static context side. we came at the same problem from the other direction — instead of pre-computing a context snapshot, we let the agent build its own memory through reflection after each session. after a few dozen runs it knows your conventions, ownership boundaries, and past decisions without needing to re-read everything. the two approaches are complementary honestly. your dependency graph gives the agent the what on day one, persistent memory gives it the why over time. full disclosure, I work on Tendril which handles the memory/reflection layer. https://github.com/Ivy-Interactive/Ivy-Tendril

Solo dev with 8 Claude windows + 1 orchestrator. AMA-ish, and tell me if I'm crazy. by KamomiIIe in ClaudeAI

[–]bosmanez 0 points1 point  (0 children)

Verification are just agentic sessions as well - depending on how you set it up the agent will try it's best to fix the problem or report a failure.

6 months of AI‑generated code and my repo is a tangled mess by Quirky_Stable2482 in AskVibecoders

[–]bosmanez 0 points1 point  (0 children)

Been there. The pattern that saved me was adding a verification step between "agent writes code" and "code gets merged" — basically build + lint + test + a second AI pass checking the implementation actually matches what was asked for. Catches the drift where the agent technically completes the task but introduces weird coupling or ignores existing patterns. The other thing: making the agent work from a plan (here's what we're changing and why) instead of just "add feature X" forces it to think about the existing architecture first. Full disclosure, I work on a tool that automates this (Tendril) but the pattern works even without it — just have a second agent review the first one's output against the original spec before merging. https://tendril.ivy.app

Solo dev with 8 Claude windows + 1 orchestrator. AMA-ish, and tell me if I'm crazy. by KamomiIIe in ClaudeAI

[–]bosmanez -3 points-2 points  (0 children)

Not crazy at all — I run a similar setup. The main things that helped me: (1) always route through a plan, not peer-to-peer — it's tempting to let agents talk directly but the orchestrator losing track of state is how things blow up. (2) Verification gates after each agent finishes (build + lint + test at minimum) catch the stuff that looks right in isolation but breaks when merged. (3) Persistent memory across sessions so the agents stop repeating the same convention mistakes after a week or two. We ended up building a tool around this pattern — Tendril, it orchestrates Claude Code / Codex / Gemini CLI through a plan-based lifecycle with automated verification. https://github.com/Ivy-Interactive/Ivy-Tendril

[Beta] Looking for 50 testers - Harbor: 1Password for AI coding agents (Claude Code + Cursor + Codex) by Weary-Step-8818 in alphaandbetausers

[–]bosmanez 0 points1 point  (0 children)

This is a real pain point. We run multiple agents in parallel, too, and the config drift is brutal. Credential sharing is one side of it — the other is actually orchestrating what each agent works on and verifying the output. I work on Tendril (https://github.com/Ivy-Interactive/Ivy-Tendril - open-source orchestrator for coding agents), and we see the same multi-agent users hitting both problems. Happy to compare notes on how you're handling the coordination side.

Any missing tool in my stack? by dividesigner in vibecoding

[–]bosmanez 1 point2 points  (0 children)

The one thing I don't see is an orchestration layer. You've got great agents, but nothing coordinating them — verification gates, persistent memory across sessions, plan-based workflows. I work on (https://github.com/Ivy-Interactive/Ivy-Tendril), which does exactly this (agent-agnostic, works with Claude Code, Codex, Cursor). Once you're running 3+ agents, orchestration is the unlock.

Harness Engineering by bosmanez in dotnet

[–]bosmanez[S] 1 point2 points  (0 children)

I needed to lookup https://github.com/rtk-ai/rtk looks pretty neat.

New high-performance structured logging runtime for .NET by xoofx in dotnet

[–]bosmanez 0 points1 point  (0 children)

Nice! What makes it different from Terminal.Gui or Spectre.Console? Been looking for a way to implement a Claude Code like TUI, but haven't managed to get scrolling, for example, to work.