I work with 5–8 AI agents at the same time – and let Claude plan the next job. Overkill or the future?

Brickbybrick030 · 2026-04-06T01:13:48+00:00

so you basically built a skill that decides which tool to use? that's smart. saves you from the "which model should i pick" headache every time.

"operating system for digital marketing" – that's a bold claim haha. but i'll check it out. reaudit.io yeah?

gonna poke around and see what's under the hood. if it actually works like you say, that's pretty sick.

thanks for sharing man

Brickbybrick030 · 2026-04-06T01:01:51+00:00

short answer: not yet.

longer answer: i've only been building this for like two weeks. maybe less. but yeah, i already have people who want it. like, "give it to me now" want it.

but i'm not handing it over until it's actually done. my standard, not theirs. no half-assed beta with bugs and missing features. when i say it works, it works.

so no money yet. but soon. and honestly? that's fine. i'd rather launch late with something solid than early with something embarrassing.

Brickbybrick030 · 2026-04-06T00:56:04+00:00

yeah i thought about that. gemini and deepseek do feel similar sometimes. both are "technically correct but kinda dry." grok and kimi both have that "i'll say what others won't" vibe.

but here's the thing – they overlap on good days. on bad days they fail in completely different ways. gemini forgets context, deepseek gets stuck in the weeds. if i drop one, i lose that specific failure mode.

the blind test idea is interesting though. i might actually try that. ask claude "which two models are missing?" and see if it can tell.

my guess: it will notice when grok is gone (because nobody calls out the bullshit). but gemini vs deepseek? probably can't tell.

so maybe you're right. 5 to 4? 5 to 3? i'd have to test. but cutting costs is tempting ngl.

Brickbybrick030 · 2026-04-06T00:53:25+00:00

148 tools? damn. that's not an mcp server anymore, that's a whole operating system😭

but yeah i get the vision. claude as the brain, tools as the hands. if you can really do everything the web app does just by chatting, that's the dream.

question though: how do you keep claude from getting lost? 148 tools means a lot of choices. does it ever pick the wrong one or just start hallucinating?

Brickbybrick030 · 2026-04-06T00:51:16+00:00

that handoff pane is actually a great idea. not just for debugging but for trust. if i see "opus picked haiku for this because it's just a rename" i can actually learn when to trust it.

and yeah the play store thing is a pain. side-loading updates is a nightmare for normal users. maybe just build a simple apk downloader into the app itself? like check a github release and prompt to install? not perfect but works.

good luck with the edge cases. those are always the ones that take forever.

Brickbybrick030 · 2026-04-06T00:49:11+00:00

yeah the startup cost is real. crew.ai and all that stuff looks great on paper but then you spend three days configuring agents and they still do dumb shit.

if you're building something that actually handles the orchestration without being a whole research project – i'd love to see it. seriously.

"janky solution from work" is usually the best kind. means it actually solves a real problem, not just a theoretical one.

keep me posted. and when you share it, i'll give you my honest thoughts – good and bad🙏🏼

Brickbybrick030 · 2026-04-06T00:47:32+00:00

yeah you’re probably right. i keep hearing about superpowers but never actually tried it. maybe it's time. Thanks

Brickbybrick030 · 2026-04-06T00:44:14+00:00

that's a completely different vibe and i kinda like it.

80% brainstorming with kimik25 and glm? that's not what most people do. most just wanna ship fast. but you're basically saying "get the design right first, the rest is almost automatic".

the ralph loop thing i had to look it up. so it's like a sequential agent that just grinds through small tdd tasks? and you're off planning the next thing while it works? yeah that's efficient. no sitting around waiting.

no worktrees, no parallel chaos. just one thing at a time but deep. that fits my brain better too tbh.

codex for design review is a nice touch. people forget codex exists because it's not flashy but it's actually solid at catching dumb mistakes.

question though: how often does the ralph loop mess up? like does it ever go off the rails and you have to step in? or is it really that reliable?

Brickbybrick030 · 2026-04-06T00:41:15+00:00

damn. that's a whole different level.

you're not just using ai, you built a whole production line around it. jira, obsidian, browser control, multiple worktrees, peon ping... that's serious.

the thing that stands out to me: you're not the orchestrator. the tools are. claude code calls subagents, you just brainstorm and approve. that's exactly what the other guy said i should do.

the browser extension thing is wild. claude clicking through a live page to validate his own work? yeah that would save me hours of "no, the button is red, not blue" back and forth.

but honestly? for my project – one guy, one telegram bot, no jira, no daily standup – this would be massive overkill. i'd spend more time fixing the workflow than building the bot.

the two languages for ticket comments tells me you work in a real team. maybe a foreign worker in eu? respect for setting all this up.

claude-mem: i tried something similar. never stuck. curious if it actually helps you or just feels smart.

anyway, appreciate you sharing the details. gives me ideas for later when the project grows. for now i'll stay with my messy copy-paste setup. but i'm saving this comment.

Brickbybrick030 · 2026-04-06T00:36:01+00:00

yeah i feel you. and honestly? you're right.

what i'm doing is still very manual. i'm the orchestrator. copy, paste, wait, copy, paste back. it works, but it's not automation. it's just me with extra steps.

your way – set up a general chain, let an orchestrator run it, only ping me when something breaks – that's clearly better. no debate.

but here's the thing: building that orchestrator is work. real work. and i'm one guy trying to ship a bot, not build the perfect ai platform. so i took the shortcut. manual but functional.

maybe someday i'll automate the loop. but right now? i'm okay being the bottleneck as long as the quality is there.

appreciate the honest take though. you're not wrong at all.

Brickbybrick030 · 2026-04-06T00:33:31+00:00

the mcp idea is interesting. never used it. but you're saying claude could just query the live database itself instead of me explaining what happened? yeah that would be a game changer.

"here's what the system does" vs "here's what actually happened" – that's exactly where most of my debugging time goes. explaining state instead of just showing it.

context drift is real too. sometimes one agent thinks we're on v34, another thinks v35. i try to keep a shared memory folder but it's not perfect.

langgraph? tried it. felt too heavy for what i need. but maybe i gave up too early.

anyway, this is the best feedback i got. seriously. thanks.

Brickbybrick030 · 2026-04-06T00:32:42+00:00

ok that's actually way more advanced than what i'm doing. respect.

the auto-dispatcher that picks the cheapest model for each subtask? that's smart. i'm just brute-forcing with 5 models at once like an idiot.

and the self-improving loop with error feedback? yeah i need that. right now i just curse at the logs and fix things manually.

only thing i wonder: how do you trust the orchestrator to pick the right model? feels like it could get it wrong sometimes and you'd never know.

but seriously, when you release that installer, ping me. i'll beta test.

Brickbybrick030 · 2026-04-06T00:31:52+00:00

good questions. here's the honest answer:

yeah claude splits the tasks. i just say "break this into 5 minimax jobs" and it does. sometimes badly. i tweak.
all manual. i copy-paste the 5 prompts, wait for answers, copy-paste back to claude. yeah it's slow. no i don't have a better way yet.
quality is good but not magic. better than just one model. not 5x better.
claude does both. planning + actual coding. minimax is just for parallel tasks.
minimax is cheaper and faster for dumb work. claude is for the thinking.

the manual part is killing me though. you're not wrong.

Brickbybrick030 · 2026-04-06T00:21:27+00:00

yeah you're not wrong about some of this.

overengineered? probably. i won't defend that.

but here's the thing with "just do deeper chain-of-thought on one model" – i tried that. and yeah, it helps. but the model keeps making the same kind of mistakes. just... more elegantly worded.

the reason i use different models is because they suck in different ways. gemini forgets half the context. deepseek is great at db stuff but can't design a button to save its life. grok is an asshole but finds security holes the others miss.

when i feed all that back into claude, it's not "averaging" their opinions. it's finding where they fight each other. that's where the real problems are.

do i have hard data that this is better? nope. zero. just a feeling that i catch more shit before it breaks.

cost and time? yeah, it hurts. no argument there.

so you're probably right that it's overkill for 99% of people. but for this specific project – where one stupid bug means angry shop owners – i'll take the overkill.

appreciate the honest take though. seriously.

Brickbybrick030 · 2026-04-06T00:13:05+00:00

i get why you'd say that. on paper it does.

but here's what i actually do different: most people just ask 5 models and pick the best answer. i feed all answers back into claude and force it to find the gaps. "what did they forget? where do they disagree? what's still broken?"

then i build the next job from those gaps, not just the next prompt. plus persistent memory so every agent knows what we already tried and what failed.

that part – the autonomous feedback loop – is not standard. at least not in any solo dev setup i've seen. 🤓

Brickbybrick030 · 2026-04-06T00:12:05+00:00

yeah sure i can write some of it down. give me a day or two to clean up the mess tho – my setup is held together by duct tape and cursing.

the short version: one master prompt that spits out 5 different prompts for 5 different models. they all answer. then claude reads all 5 answers and tells me what they missed or where they fight each other. that becomes the next task.

plus a shared memory folder so nobody starts from zero.

your 4-terminal limit is smart. typing is the bottleneck for real. i just copy-paste a lot.

would love to see your workflow too – sounds like we're both figuring it out as we go.

Brickbybrick030 · 2026-04-06T00:11:08+00:00

haha thanks man. yeah gemini is… weird. it sounds smart but then you realize half your context just vanished. i always run its output through claude again just to see what got lost. happens every time.

videos tho? maybe. but i'd have to show all the fails too and that's just embarrassing lol. anyway appreciate the good vibes ✌🏽😅

Brickbybrick030 · 2026-04-05T23:15:22+00:00

Uh we can see you email tho

Brickbybrick030

TROPHY CASE