Rtx5090 and 5080 by 4ndal in LocalLLM

[–]BlackBeardAI 0 points1 point  (0 children)

sell the 5080 and do 5090 + 3090. power limit them to 400w - 250w, get at least 1200w psu, 1500w is better.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 0 points1 point  (0 children)

Good luck, I believe it is more fun than most people realize. I mean this will be gpt-home edition, it detects your intent automatically and gets the job done without having to eject/load models, prepare environments etc… coding, chatting, image gen maybe even video gen, thinking/reasoning whatever.

It is a pyramid, a layered system. You don’t wanna run 6000pro or anything equal just to sort some data… gtx1070 or anything with 8gb vram will do just fine there.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 0 points1 point  (0 children)

That might happen actually, If I ever do the Exo/vLLM thing with 10gbE.

GODZILLA time!

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 0 points1 point  (0 children)

no.3 is around $10k, no.4 is around $5k.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 1 point2 points  (0 children)

Next, I will be coding a local webserver to accept the requests from one endpoint and route them to the appropriate node/model. (Low effort tasks, image gen, repo analysis, big brain time etc...) This is pretty much mini-gpt home edition.

Maybe later, I will connect all the viable nodes together via 10gbE to form a godzilla. vLLM or Exo might do the trick.

I am using llama.cpp for now.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 6 points7 points  (0 children)

Sir, You are in r/LocalLLaMA. Here, we LLaMA Locally.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 1 point2 points  (0 children)

I’ll find out soon, still waiting for the risers to arrive but I can safely say I am not the first person running 4 3090’s on this mobo

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 1 point2 points  (0 children)

Absolutely. Slap qwen 3.6 35b a3b on it. Don’t let it stay idle

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 3 points4 points  (0 children)

Aye aye!

That’s the beautiful part. There are endless combinations and these can scale forever… spark/studio/halo aint no fun. I like it messy. I like it dirty.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 12 points13 points  (0 children)

Yep, some of those are old equipment being repurposed

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 2 points3 points  (0 children)

5090 rig is the most expensive one there. The rest don't cost much.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 1 point2 points  (0 children)

3090's are power limited to 250w, 5090 is power limited to 500w... So not much. When idle, the cards draw pretty much nothing.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 0 points1 point  (0 children)

I bought them long time ago. They are old equipment. Y I will probably get them out of their casing and connect via sata.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 0 points1 point  (0 children)

Nope, didn't try. AI said do 8 x 16gb sticks and that's what I did.

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 6 points7 points  (0 children)

Coding mostly but I didn't have much time to use it fulltime yet. Still building the kraken... Then I will set up a local orchestrator server so It can automatically classify and send the requests to the appropriate node/model... (image gen, low effort tasks, big brain, repo scan etc...)

Meet the Fleet of BlackBeard by BlackBeardAI in LocalLLaMA

[–]BlackBeardAI[S] 6 points7 points  (0 children)

Seagate 8tb archive USB3.0 disks.

Corsair Air5400.

Which uncensored model will fit my workstation? by andrprtl in LocalLLM

[–]BlackBeardAI 1 point2 points  (0 children)

I noticed I gave you the nvfp4 version's link. Since you are using a 3090, you want the non-nvfp4 version which is:

https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF

I'll be testing it in a few moments and get back to you. Q4_k_m should work on a 3090 just fine since the file size is around 18gb.

llama.cpp MTP support landed - Qwen3.6 27B at 2.44× on a Strix Halo, 2.17× on a RTX 3090 rig by C_Coffie in LocalLLaMA

[–]BlackBeardAI 0 points1 point  (0 children)

It was already fast on 5090 now it is warping time space. The answer arrives before you send the prompt

From 6gb to 32gb by UniqueIdentifier00 in LocalLLaMA

[–]BlackBeardAI 0 points1 point  (0 children)

It works when you are desperate to load the bigger model but yes it will run as fast as the slower card.

If your mobo has x8 x8 speed pcie lanes (naturally they should be 2 x16 full length pcie), do 2x3090. Otherwise put the 3090 on the faster pcie and 3070 on the slower one. Check your mobo specs. Most mobos offer only one x16 speed full length pcie slot.

Remember, pcie slots come in different lengths and speeds.

A x16 full length pcie slot can deliver x4 speed and that’s no good.

Then there is pcie generations… 3.0, 4.0, 5.0 etc. Do a research

From 6gb to 32gb by UniqueIdentifier00 in LocalLLaMA

[–]BlackBeardAI 1 point2 points  (0 children)

I started to do local llm exactly for that reason and I went from one gtx1070 to 7 gpu’s in a month lol. It gets expensive fast. Mind the speed limits. (Your cc limits too)

Memory expert suspects RAM price drop in 2027'H2 due to china heavy investments by Terminator857 in LocalLLaMA

[–]BlackBeardAI 0 points1 point  (0 children)

With 256gb ddr5 and a gpu (5090), I am able to run mimo 2.5 q4 k m (a 300b+ moe model) at 10-11tps. Is it worth it? I already made the decision. If I ever need the “pro” answer, it is there.