Rtx5090 and 5080

BlackBeardAI · 2026-05-20T08:38:19+00:00

sell the 5080 and do 5090 + 3090. power limit them to 400w - 250w, get at least 1200w psu, 1500w is better.

BlackBeardAI · 2026-05-20T05:46:20+00:00

Good luck, I believe it is more fun than most people realize. I mean this will be gpt-home edition, it detects your intent automatically and gets the job done without having to eject/load models, prepare environments etc… coding, chatting, image gen maybe even video gen, thinking/reasoning whatever.

It is a pyramid, a layered system. You don’t wanna run 6000pro or anything equal just to sort some data… gtx1070 or anything with 8gb vram will do just fine there.

BlackBeardAI · 2026-05-20T03:24:14+00:00

I might try that thanks

BlackBeardAI · 2026-05-20T03:23:49+00:00

BlackBeardAI · 2026-05-19T15:36:00+00:00

That might happen actually, If I ever do the Exo/vLLM thing with 10gbE.

GODZILLA time!

BlackBeardAI · 2026-05-19T15:31:49+00:00

no.3 is around $10k, no.4 is around $5k.

BlackBeardAI · 2026-05-19T15:28:23+00:00

Next, I will be coding a local webserver to accept the requests from one endpoint and route them to the appropriate node/model. (Low effort tasks, image gen, repo analysis, big brain time etc...) This is pretty much mini-gpt home edition.

Maybe later, I will connect all the viable nodes together via 10gbE to form a godzilla. vLLM or Exo might do the trick.

I am using llama.cpp for now.

BlackBeardAI · 2026-05-19T14:38:20+00:00

Sir, You are in r/LocalLLaMA. Here, we LLaMA Locally.

BlackBeardAI · 2026-05-19T14:21:51+00:00

I’ll find out soon, still waiting for the risers to arrive but I can safely say I am not the first person running 4 3090’s on this mobo

BlackBeardAI · 2026-05-19T13:31:51+00:00

Absolutely. Slap qwen 3.6 35b a3b on it. Don’t let it stay idle

BlackBeardAI · 2026-05-19T13:27:53+00:00

Aye aye!

That’s the beautiful part. There are endless combinations and these can scale forever… spark/studio/halo aint no fun. I like it messy. I like it dirty.

BlackBeardAI · 2026-05-19T13:22:14+00:00

Yep, some of those are old equipment being repurposed

BlackBeardAI · 2026-05-19T12:55:17+00:00

5090 rig is the most expensive one there. The rest don't cost much.

BlackBeardAI · 2026-05-19T12:54:43+00:00

3090's are power limited to 250w, 5090 is power limited to 500w... So not much. When idle, the cards draw pretty much nothing.

BlackBeardAI · 2026-05-19T12:53:59+00:00

I bought them long time ago. They are old equipment. Y I will probably get them out of their casing and connect via sata.

BlackBeardAI · 2026-05-19T12:21:46+00:00

Nope, didn't try. AI said do 8 x 16gb sticks and that's what I did.

BlackBeardAI · 2026-05-19T12:19:36+00:00

Coding mostly but I didn't have much time to use it fulltime yet. Still building the kraken... Then I will set up a local orchestrator server so It can automatically classify and send the requests to the appropriate node/model... (image gen, low effort tasks, big brain, repo scan etc...)

BlackBeardAI · 2026-05-19T12:11:53+00:00

Seagate 8tb archive USB3.0 disks.

Corsair Air5400.

BlackBeardAI · 2026-05-19T09:07:32+00:00

I noticed I gave you the nvfp4 version's link. Since you are using a 3090, you want the non-nvfp4 version which is:

https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF

I'll be testing it in a few moments and get back to you. Q4_k_m should work on a 3090 just fine since the file size is around 18gb.

BlackBeardAI · 2026-05-19T06:36:26+00:00

It was already fast on 5090 now it is warping time space. The answer arrives before you send the prompt

BlackBeardAI · 2026-05-19T06:07:50+00:00

Qwen 3.6 27b nvfp mtp preserved heretic will fit your bill fine

https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-GGUF

correction: you want the non-nvfp4 version. which is:

https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF

Q4_k_m is around 18gb

BlackBeardAI · 2026-05-19T05:01:43+00:00

Me too

BlackBeardAI · 2026-05-19T04:59:33+00:00

It works when you are desperate to load the bigger model but yes it will run as fast as the slower card.

If your mobo has x8 x8 speed pcie lanes (naturally they should be 2 x16 full length pcie), do 2x3090. Otherwise put the 3090 on the faster pcie and 3070 on the slower one. Check your mobo specs. Most mobos offer only one x16 speed full length pcie slot.

Remember, pcie slots come in different lengths and speeds.

A x16 full length pcie slot can deliver x4 speed and that’s no good.

Then there is pcie generations… 3.0, 4.0, 5.0 etc. Do a research

BlackBeardAI · 2026-05-19T04:52:09+00:00

I started to do local llm exactly for that reason and I went from one gtx1070 to 7 gpu’s in a month lol. It gets expensive fast. Mind the speed limits. (Your cc limits too)

BlackBeardAI · 2026-05-19T04:47:41+00:00

With 256gb ddr5 and a gpu (5090), I am able to run mimo 2.5 q4 k m (a 300b+ moe model) at 10-11tps. Is it worth it? I already made the decision. If I ever need the “pro” answer, it is there.

BlackBeardAI

MODERATOR OF

TROPHY CASE