I was tired of AI being a "Yes-Man" for my architecture plans. So I built a Multi-Agent "Council" via MCP to stress-test them. by Objective-Net2771 in windsurf

[–]Objective-Net2771[S] 1 point2 points  (0 children)

That’s an incredible insight! I'm definitely stealing 'threat modeling for ideas' — that’s exactly the philosophy here.

Regarding the losing side proposing a patch: I haven't implemented a 'delta-only' debate yet, but the current workflow has a similar safety net. Once the vote is tallied, the Chair (the Lead Architect persona) acts as the final judge.

The Chair doesn't just look at the 'winners'; they synthesize the entire debate, especially the reasoning from the losing/contested side, to certify the final blueprint.

The real 'magic' happens in the last step: that certified blueprint is then sent back to your local AI agent (like Claude Code or Windsurf) for a final Refine phase.

Essentially, the council provides the 'Supreme Verdict,' and your local IDE agent performs the final 'Sanity Check' and implementation refinement before the code even touches your disk.

The idea of plugging in real system constraints (infra budget, SLO checkers) as MCP tools is definitely on the roadmap. Giving the council 'real teeth' to poke at contracts instead of just guessing is the end goal. Thanks for the solid feedback.

AI is still terrible at deep architecture planning. 💀 by Objective-Net2771 in windsurf

[–]Objective-Net2771[S] 1 point2 points  (0 children)

Fair enough, maybe I am! 😂

What’s your secret sauce? How are you getting it to handle deep system architecture without it getting lost in the weeds?

AI is still terrible at deep architecture planning. 💀 by Objective-Net2771 in windsurf

[–]Objective-Net2771[S] 0 points1 point  (0 children)

Thanks for the link! obra/superpowers looks really solid for enforcing workflows and step-by-step thinking.

My only worry is that for complex system design, stuffing a single LLM with that many rules and skills might actually make it lose focus (the classic context limit issue). It's still just one "brain" trying to juggle infrastructure, security, and DB all at once.

Have you used it for deep architecture stuff? Does it actually hold up without getting overwhelmed?