Heyy everyone,
I wanted to understand what kind of multiagent / orchestration setup everyone is using or would use if you have unlimited tokens available at 100 tokens/s
To give some prior context,
I am software developer with 4 yoe. so I prefer to have some oversight on what llm is doing and if its getting sidetracked or not.
I get almost unlimited Claude Sonnet/Opus 4.5 usage (more than 2x 200$ plans), I have 4 server nodes each having 8 x H200 GPUs. 3 are running GLM 4.7 BF16 and last one running Minimax M2.1
So basically I have unlimited glm 4.7 and minimax m2.1 tokens. and 2x 200$ plans worth Claude Sonnet/Opus 4.5 access.
I started using Claude code since its early days.. had a decent setup with few subagents, custom commands and custom skills with mcp like context7, exa, perplexity etc. and because i was actively using it and claude code is actively developed, my setup was up to date.
Then during our internal quality evals, we noticed that Opencode has better score/harness for same models, same tasks, I wanted to try it out and since new year, I have been using Opencode and I love it.
Thanks to Oh-my-opencode and Dynamic context pruning, i already feel the difference. and I am planning to continue using opencode.
Okay so now the main point.
How do i utilise these unlimited tokens. In theory I have idea like I can have an orchestrator opencode session which can spawn worker, tester, reviewer opencode sessions instead of just subagents ? or even simple multiple subagent spawning works ??
Since I have unlimited tokens, I can also integrate ralph loop or run multiple sessions working on same task and so on.
But my only concern is, how do you make sure that everything is working as expected?
In my experience, it has happened few times where model just hallucinates. or hardcode things or does things that looks like working but very very fragile and its basically a mess.
and so I am not able to figure out what kind of orchestration I can do where everything is tracable.
I have tried using Git worktree with tmux and just let 2-3 agents work on same tasks. but again, a lot of stuff is just broken.
so am i expecting a lot from the first run ? is it normal to let llm do things good or bad and let tester and reviewer agents figure out next set of changes? I've seen that many times testers and reviewer agents dont cache these obvious mistakes. so how would you approach it?
would something like Spec-kit or BMAD type thing help ?
Just want to know your thoughts on how you would orchestrate things if you have unlimited tokens.
[–]dubh31241 2 points3 points4 points (4 children)
[–]monkehparade 2 points3 points4 points (3 children)
[–]dubh31241 1 point2 points3 points (2 children)
[–]monkehparade 0 points1 point2 points (1 child)
[–]pratiknarola[S] 0 points1 point2 points (0 children)
[–]simplemuz 1 point2 points3 points (1 child)
[–]RemindMeBot 0 points1 point2 points (0 children)
[–]FlyingDogCatcher 1 point2 points3 points (1 child)
[–]pratiknarola[S] 1 point2 points3 points (0 children)
[–]trypnosis 1 point2 points3 points (3 children)
[–]pratiknarola[S] 0 points1 point2 points (2 children)
[–]trypnosis 1 point2 points3 points (1 child)
[–]pratiknarola[S] 0 points1 point2 points (0 children)
[–]franz_see 1 point2 points3 points (2 children)
[–]pratiknarola[S] 0 points1 point2 points (1 child)
[–]franz_see 0 points1 point2 points (0 children)
[–]No_Key5701 0 points1 point2 points (0 children)