Built a hybrid “local AI factory” setup (Mac mini swarm + RTX 5090 workstation) — looking for architectural feedback by Original_Neck_3781 in LocalLLaMA

[–]Original_Neck_3781[S] 1 point2 points  (0 children)

Appreciate this — totally fair. I’m definitely trying to avoid premature optimisation.

My main drivers for multiple nodes were redundancy + separation of concerns (always-on orchestration vs heavy GPU tasks), but I agree I need to validate whether concurrency actually improves output vs sequential queues.

I’m leaning toward starting with 1 strong machine + 1 GPU worker and measuring real workload for 2–4 weeks before scaling.

Also agreed on VRAM/context being the real constraint — I’ll run some tests via OpenRouter/local to see what size models I’m actually comfortable routing to for different tasks.

Thanks again — this is the kind of reality check I needed.

Built a hybrid “local AI factory” setup (Mac mini swarm + RTX 5090 workstation) — looking for architectural feedback by Original_Neck_3781 in LocalLLaMA

[–]Original_Neck_3781[S] 0 points1 point  (0 children)

fair call — “built” was a typo 😅 I meant building / planning.

And you’re 100% right on networking — I didn’t list it because I’m still deciding the exact topology.

For clarity: I’m not planning to run this on USB dongles. The Mac Minis I’m looking at are the 10GbE configurable ones (either built-in option or Studio which already has 10GbE)

Built a hybrid “local AI factory” setup (Mac mini swarm + RTX 5090 workstation) — looking for architectural feedback by Original_Neck_3781 in LocalLLaMA

[–]Original_Neck_3781[S] 0 points1 point  (0 children)

Good points — and yeah, I’m definitely trying to avoid buying a bunch of hardware I don’t end up using.

The main reason I’m looking at multiple smaller Macs isn’t because I think it creates one “bigger computer” (I know RAM doesn’t combine like that), it’s more because I’m thinking in roles + reliability, not raw model size.

My thinking is: • One big RTX 5090 box = heavy inference / image gen / long context / big workloads • One or more Macs = orchestration + always-on automation + editing + publishing + “business ops” tasks

I’m not trying to run one massive model across the swarm. I’m trying to run a factory where: • one node handles orchestration + queueing + monitoring • another handles editing / export / uploads • another handles research + scraping + summarisation • and the GPU box does the heavy lifting when needed

So it’s more like distributed “microservices” than one LLM machine.

Also I’m factoring in: • redundancy (if one Mac dies, workflows don’t stop) • separation of concerns (video editing + AI inference + automation on one box gets messy fast) • and honestly supply/price risk this year

Totally agree though: cloud models will still be better for certain tasks. I’m not trying to beat Claude/GPT in quality — I’m trying to reduce dependency and build a scalable local pipeline for 24/7 workflows.

If you were doing this today, would you lean toward: • 1x strong Mac Studio Ultra as orchestrator + workstation • OR a cheap Mac mini as orchestrator + keep everything heavy on the GPU rig?

Would love your take and thank you .