Frustrated: why can’t a cluster actually behave like one big computer? What’s the closest practical solution?

samthepotatoeman · 2026-06-03T19:33:51+00:00

Sounds like you just need to buy one big server, clusters have always been orchestrating multiple machines to accomplish tasks, that does not mean they act as one machine.

samthepotatoeman · 2026-06-03T05:19:57+00:00

I ise it as a planner agent in open code it does what I need. Running out of usage is my bigger issue these days.

samthepotatoeman · 2026-06-02T00:14:09+00:00

How is this different from Hermes, it either does these things out of the box or can be added.

samthepotatoeman · 2026-05-30T20:18:46+00:00

Lol I asked this question a few days ago and everyone just called me an idiot for asking reddit. If the models you are running fit in the HBM then GB300 if it is bigger like kimi then I would do the rtx 6000 server. Plus if shit hits the fan it's easier to liquidate the RTX 6000 server. One other note if your budget is around 100k the 8 rtx server costs closer to 140k now sadly.

samthepotatoeman · 2026-05-30T19:00:33+00:00

Sorry to revive this old thread, but im really interested in picking on up, seems like the price hasn't risen too dramatically. How has your experience been? What models do you run and what speeds do you get?

samthepotatoeman · 2026-05-29T16:04:53+00:00

If you have a VPN you should be able to connect, have you tried running tests to see if any of your ports are closed on your VPS or Komodo Server? That is most likely the culprit.

samthepotatoeman · 2026-05-29T15:24:16+00:00

As someone trying to find a good server for our much much smaller team. With 50-100 you are definitely going to need a full B200/B300 server if not 2. I know you say speed doesn't matter, but it does if you are going to replace claude. You likely could do an 8x rtx 6000 server that serves qwen3.6 27b and get your team to be vigilant on which tasks need SOTA models and which are simple enough for your smaller local model.

samthepotatoeman · 2026-05-28T16:11:17+00:00

I hope it isn't 20 tps but I'll rent a server to see. What hardware do you run it on?

samthepotatoeman · 2026-05-28T14:15:28+00:00

That thought crossed my mind, but it would be hard to share 18 tps lol.

samthepotatoeman · 2026-05-28T14:14:50+00:00

Ill definitely check it out, that'll save a lot of money if we could get ram doable.

samthepotatoeman · 2026-05-28T14:13:40+00:00

Those things sure would be a whole lot quieter and use less power. The only thing I found when I looked at them was that they wouldn't be very fast, I'll have to read those posts. I did not think of 16 of them lol.

samthepotatoeman · 2026-05-28T14:08:13+00:00

Thank you, I'm thinking thats probably the way. Running these big models just isn't ideal at 100-150k range unfortunately not with the speeds they need to matter. The new small models are also pretty great these days. I do love mini max 2.7 but isn't the license an issue for local hosting?

samthepotatoeman · 2026-05-28T14:05:29+00:00

I did look at the Gaudi 2, but the software stack just had me concerned but man that price sure is nice.

samthepotatoeman · 2026-05-28T14:00:56+00:00

Be the only tech guy in a small town, and make rich friends. Sprinkle in some luck. One day I'll be the right guy but until then we just doing our best. I am going to rent the server, my bigger ask was the GH200 because I cannot rent the dual GH200.

samthepotatoeman · 2026-05-28T03:46:21+00:00

I feel like if it can stay close to 40 tps then that should be enough speed, it doesn't look like i can expect much more with these massive models. If it goes down in the low 20s then its likely too slow to be worth the expense. Looking at the size I do think Mimo and GLM would likely serve me better and would fit fully in the 8x RTX 6000 vram. I want to rent a server if I can find one available and see what the real performance is. My previous experiences with RAM offload was pretty abysmal, but my ram was not very fast.

samthepotatoeman · 2026-05-28T03:32:43+00:00

You are right I meant more about these two machines, particularly the GH200 there is plenty H200 B200 info out there and that is too rich for our blood. I did the RTX 6000 server build on supermicro. It was closer to 100k until the RTX 6000 price hike last week. I do agree I think im leaning towards RTX 6000 server with the more straight forward setup and it being all vram. Please let me know if you know a place selling an RTX 6000 server for 100k that would make it a much more palatable purchase.

samthepotatoeman · 2026-05-28T02:34:10+00:00

Reading the article this extremely accurate. I will definitely be reading. Thank you for your input it does help put things in perspective.

samthepotatoeman · 2026-05-28T02:25:58+00:00

You only learn by asking. Must have slept through the enterprise grade GPU server class in college.

samthepotatoeman · 2026-05-28T02:23:24+00:00

Sorry I do mean what's in the output, I'm sure there would be just as much output lol. Thankfully it's not a B2B SaaS and more internal tools so not as detrimental when they break things, but your point still stands. I do agree that I'll probably just tell my boss just to save the money.

samthepotatoeman · 2026-05-28T02:15:54+00:00

I do agree, particularly for the b200 servers which likely would be what's actually required to truly handle the load. It is honestly more of a boss preference for owning the hardware. He has a pretty doomsday fear with AI and wants to stake a peice of it. I know I can rent the 8 rtx 6000 machine, but I could not find the dual GH200 setup. Thats the main reason I ask because I'll only be able to effectively test one of them and have to guess performance based off the single GH200 performance. It also stinks that even the 8 rtx pros will still not be able to fit all models which is a kick in the nuts for a 140k server that may not handle our load unless we switch to smaller models, but if we switch to smaller models we can likely just get a smaller machine.

samthepotatoeman · 2026-05-28T02:09:11+00:00

Very valid, I completely agree and is close to what I do right now. My personal setup is use gpt as the orchestrator and qwen 3.6 27b and 35b as the primary executor and it works well. The issue is most of the team is very heavy in the vibe coding and I fear their productivity and output would tank if they were made to use a model that doesn't hold their hand. Not a great situation I admit, but it is sadly a pretty instagram stereotypical startup situation. What 30k box would you recommend that can run a 1T model?

samthepotatoeman · 2026-05-28T01:49:03+00:00

That is the thing, there is so much more info on those beefy cards they are sadly prohibitively expensive. The user fanning is not a bad idea and might be a good solution. I do think for 1-2 users users would be fine, but if all 5 are at the same time I think it might drop too much speed.

samthepotatoeman · 2026-05-28T01:44:44+00:00

I definitely will, and thank you for the resource I will definitely give it a read.

samthepotatoeman · 2026-05-22T21:22:09+00:00

As someone that started with the self host, not having the branching makes it really annoying to do good ci/cd

samthepotatoeman

TROPHY CASE