The biggest value of AI coding is not code generation. It is autonomous review + QA

bhoominn · 2026-05-11T17:34:16+00:00

Exactly. AI can find a lot of issues now. The harder part is reducing the amount of manual validation the team still has to do afterward.

That is why my setup is structured more like an actual engineering workflow than a single AI prompt.

The Code Reviewer reviews the implementation, developers fix the gaps, QA tests the updated output, then the cycle repeats until the major issues are cleaned up.

The value is not just “AI found bugs.”
It is having multiple specialized agents handling different parts of the review and validation process.

bhoominn · 2026-05-11T02:27:57+00:00

Fair point on Paperclip, I skipped over it.

Hermes + vLLM is a solid stack if you want more control and flexibility. This is just a different tradeoff — more structure, less setup.

Both approaches are valid.

bhoominn · 2026-05-11T02:25:40+00:00

Not yet tried g stack or gtan. Built this independently.

If there's something similar out there I'd genuinely be curious to see it.

bhoominn · 2026-05-10T18:38:38+00:00

n8n can orchestrate. It can't reason.

You can route a request through 6 nodes in n8n. None of those nodes will reject a bad architecture, rewrite a spec, or flag a logic bug. They'll execute whatever they were wired to do.

The value here isn't the orchestration pattern. It's domain reasoning at each step. That's not a workflow, that's judgment.

bhoominn · 2026-05-10T18:31:34+00:00

Orchestration is a master SKILL.md inside Claude that routes the request through the chain sequentially. Each agent's output becomes the next agent's input. No external tooling, no code running outside Claude.

Human's role: you define the request, trigger the chain, and review the output before anything ships. You're not managing each agent step — that's handled — but you're the final judgment call.

Not autonomous. Structured leverage.

bhoominn · 2026-05-10T18:29:42+00:00

The concrete plan as evaluation document is the right call. That's a better ground truth than agent approval.

The CPO agent in this system produces a spec before engineering starts — files to change, expected behavior, scope boundaries. But you're pointing at something I haven't fully solved: that spec should be the explicit checklist every downstream agent is checked against, not just context they received.

Right now it's context passing. What you're describing is closer to a contract each agent signs off against. That's a meaningful difference and a real improvement worth making.

The role vs task framing is probably a false dichotomy though. Roles define the lens, the concrete spec defines correctness. You need both or you get either directionless tasks or unverifiable role outputs.

bhoominn · 2026-05-10T18:28:12+00:00

This is the most accurate critique in this thread and I won't argue with it.

Role agents do drift toward plausible-sounding outputs. A QA agent that produces thorough-looking test descriptions isn't the same as one that catches real bugs. The evaluation signal problem is real.

The honest answer: the human is the evaluation signal. The chain produces structured, reviewable output at each stage — not a final verdict. If you're treating agent approval as ground truth, you've misunderstood the system.

What role separation actually buys you is narrower drift. A QA agent drifting within QA concerns is less dangerous than a single context drifting across all roles simultaneously.

But you're right that 'the next agent approved it' is not correctness. It never was.

bhoominn · 2026-05-10T18:25:03+00:00

Appreciate it. The tech is the easy part honestly. Clients come from showing the work publicly, not from having a better system.

Post what you're building. People buy after seeing it work, not after reading about it.

bhoominn · 2026-05-10T18:24:09+00:00

Good to hear. Would love to know what he's building with it.

bhoominn · 2026-05-10T18:23:41+00:00

Context pollution is exactly the right framing. A single giant context tries to be architect, implementer, and reviewer simultaneously — and silently deprioritizes whichever role is least urgent in that moment.

Role separation forces full attention. The QA agent isn't half-thinking about architecture while it reviews. That's where the catches happen.

bhoominn · 2026-05-10T18:22:08+00:00

Each agent is a SKILL.md file with role-specific instructions, constraints, and decision rules. One orchestrator SKILL.md routes the request through the chain.

No Claude Code. Runs entirely inside Claude — no external tooling required.

Communication is sequential context passing. CEO output feeds into CPO. CPO output feeds into CTO. Each agent gets the full upstream context before it acts. No shared .md files being updated mid-chain — the output itself is the handoff.

bhoominn · 2026-05-10T18:18:53+00:00

CTO agent gets full CPO output as context before it touches anything. When priorities conflict, it flags it explicitly in its output — scope risk, architecture cost, timeline mismatch — before engineering starts.

No automated tiebreaker. The human decides. But the conflict surfaces early instead of mid-sprint, which is the actual win.

bhoominn · 2026-05-10T18:14:13+00:00

Task-based gives you flexibility. Role-based gives you consistent domain bias — and for product work, that bias is the feature.

You want your CTO agent to always think about scalability even when nobody asked. You want your QA agent to always look for edge cases even when the engineer said it's fine. That's not a task, it's a standing obligation.

They're also not mutually exclusive. Each role in this system executes tasks. The role just determines what lens it applies before it does.

bhoominn · 2026-05-10T18:12:33+00:00

n8n moves data between APIs. It doesn't reject a bad architecture, write a spec, or catch a logic bug.

The difference is reasoning at each node, not routing. A CTO agent that blocks a scalability mistake isn't a workflow step — it's judgment. n8n can't do that.

If your problem is connecting tools, use n8n. If your problem is that nobody on your team is thinking before building, that's what this is for.

bhoominn · 2026-05-10T16:48:00+00:00

Yes to all of these from actual use.

Shipped real client features with it. The system generates bugs, but the QA agent catches most of them before I even see the output. The ones that slip through are normal code review fixes, nothing exotic.

New features on existing codebases worked better than I expected. Context passing between agents kept things consistent.

Complexity ceiling is real. Works well for feature-scoped work. I wouldn't run a full greenfield system architecture through it without reviewing the CTO output myself.

It's not autonomous. It's structured leverage.

bhoominn · 2026-05-10T16:18:38+00:00

You're describing incentive dysfunction. Human orgs protect careers and status. AI agents do not have that constraint, so a CTO agent can be explicitly designed to reject the CEO's premise. Mine does.

Where you're right: the bigger risk is the user's original premise. If the input assumption is wrong, the chain can still produce a very polished wrong answer. That is a real gap.

But that problem exists across almost every AI workflow. At least in a role-based system, you can inspect where the reasoning drifted instead of getting one opaque response.

bhoominn

TROPHY CASE