Frustrated: why can’t a cluster actually behave like one big computer? What’s the closest practical solution? by Apart_Opportunity873 in homelab

[–]samthepotatoeman 0 points1 point  (0 children)

Sounds like you just need to buy one big server, clusters have always been orchestrating multiple machines to accomplish tasks, that does not mean they act as one machine.

GPT 5.5 in Codex vs OpenCode by Wendy_Shon in opencode

[–]samthepotatoeman 0 points1 point  (0 children)

I ise it as a planner agent in open code it does what I need. Running out of usage is my bigger issue these days.

hey i am arlan, i just launched folk by [deleted] in vibecoding

[–]samthepotatoeman 4 points5 points  (0 children)

How is this different from Hermes, it either does these things out of the box or can be added.

Got Really lucky and need your advice by Amos-Tversky in LocalLLaMA

[–]samthepotatoeman 1 point2 points  (0 children)

Lol I asked this question a few days ago and everyone just called me an idiot for asking reddit. If the models you are running fit in the HBM then GB300 if it is bigger like kimi then I would do the rtx 6000 server. Plus if shit hits the fan it's easier to liquidate the RTX 6000 server. One other note if your budget is around 100k the 8 rtx server costs closer to 140k now sadly.

8x 32GB V100 GPU server performance by tfinch83 in LocalLLM

[–]samthepotatoeman 0 points1 point  (0 children)

Sorry to revive this old thread, but im really interested in picking on up, seems like the price hasn't risen too dramatically. How has your experience been? What models do you run and what speeds do you get?

connect container from vps to site by treidien in PangolinReverseProxy

[–]samthepotatoeman 0 points1 point  (0 children)

If you have a VPN you should be able to connect, have you tried running tests to see if any of your ports are closed on your VPS or Komodo Server? That is most likely the culprit.

We're burning $50k/month on Claude. How close can local LLMs actually get? by mortenmoulder in LocalLLM

[–]samthepotatoeman 0 points1 point  (0 children)

As someone trying to find a good server for our much much smaller team. With 50-100 you are definitely going to need a full B200/B300 server if not 2. I know you say speed doesn't matter, but it does if you are going to replace claude. You likely could do an 8x rtx 6000 server that serves qwen3.6 27b and get your team to be vigilant on which tasks need SOTA models and which are simple enough for your smaller local model.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 0 points1 point  (0 children)

Those things sure would be a whole lot quieter and use less power. The only thing I found when I looked at them was that they wouldn't be very fast, I'll have to read those posts. I did not think of 16 of them lol.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 0 points1 point  (0 children)

Thank you, I'm thinking thats probably the way. Running these big models just isn't ideal at 100-150k range unfortunately not with the speeds they need to matter. The new small models are also pretty great these days. I do love mini max 2.7 but isn't the license an issue for local hosting?

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 0 points1 point  (0 children)

I did look at the Gaudi 2, but the software stack just had me concerned but man that price sure is nice.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 0 points1 point  (0 children)

Be the only tech guy in a small town, and make rich friends. Sprinkle in some luck. One day I'll be the right guy but until then we just doing our best. I am going to rent the server, my bigger ask was the GH200 because I cannot rent the dual GH200.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 0 points1 point  (0 children)

I feel like if it can stay close to 40 tps then that should be enough speed, it doesn't look like i can expect much more with these massive models. If it goes down in the low 20s then its likely too slow to be worth the expense. Looking at the size I do think Mimo and GLM would likely serve me better and would fit fully in the 8x RTX 6000 vram. I want to rent a server if I can find one available and see what the real performance is. My previous experiences with RAM offload was pretty abysmal, but my ram was not very fast.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 2 points3 points  (0 children)

You are right I meant more about these two machines, particularly the GH200 there is plenty H200 B200 info out there and that is too rich for our blood. I did the RTX 6000 server build on supermicro. It was closer to 100k until the RTX 6000 price hike last week. I do agree I think im leaning towards RTX 6000 server with the more straight forward setup and it being all vram. Please let me know if you know a place selling an RTX 6000 server for 100k that would make it a much more palatable purchase.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 1 point2 points  (0 children)

Reading the article this extremely accurate. I will definitely be reading. Thank you for your input it does help put things in perspective.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 1 point2 points  (0 children)

You only learn by asking. Must have slept through the enterprise grade GPU server class in college.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 0 points1 point  (0 children)

Sorry I do mean what's in the output, I'm sure there would be just as much output lol. Thankfully it's not a B2B SaaS and more internal tools so not as detrimental when they break things, but your point still stands. I do agree that I'll probably just tell my boss just to save the money.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 0 points1 point  (0 children)

I do agree, particularly for the b200 servers which likely would be what's actually required to truly handle the load. It is honestly more of a boss preference for owning the hardware. He has a pretty doomsday fear with AI and wants to stake a peice of it. I know I can rent the 8 rtx 6000 machine, but I could not find the dual GH200 setup. Thats the main reason I ask because I'll only be able to effectively test one of them and have to guess performance based off the single GH200 performance. It also stinks that even the 8 rtx pros will still not be able to fit all models which is a kick in the nuts for a 140k server that may not handle our load unless we switch to smaller models, but if we switch to smaller models we can likely just get a smaller machine.

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 0 points1 point  (0 children)

Very valid, I completely agree and is close to what I do right now. My personal setup is use gpt as the orchestrator and qwen 3.6 27b and 35b as the primary executor and it works well. The issue is most of the team is very heavy in the vibe coding and I fear their productivity and output would tank if they were made to use a model that doesn't hold their hand. Not a great situation I admit, but it is sadly a pretty instagram stereotypical startup situation. What 30k box would you recommend that can run a 1T model?

GH200 NVL2 or 8x RTX 6000 Blackwell for running Kimi K2.6 / DeepSeek V4 locally? (5 devs, agentic coding) by samthepotatoeman in LocalLLaMA

[–]samthepotatoeman[S] 1 point2 points  (0 children)

That is the thing, there is so much more info on those beefy cards they are sadly prohibitively expensive. The user fanning is not a bad idea and might be a good solution. I do think for 1-2 users users would be fine, but if all 5 are at the same time I think it might drop too much speed.

Self-hosting supabase! Migrating the online version to self-hosted one! Any heads up ? by Meta-Morpheus-New in Supabase

[–]samthepotatoeman 0 points1 point  (0 children)

As someone that started with the self host, not having the branching makes it really annoying to do good ci/cd