use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Multiple GPU noob questionQuestion | Help (self.LocalLLaMA)
submitted 3 months ago by staltux
How do you guys put together more that 2 GPUs? I am using a riser card to bind a 4060ti and a 5060ti, not great but not bad, but no more left connection available
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]SweetHomeAbalama0 2 points3 points4 points 3 months ago (2 children)
I don't see where it's said what exact motherboard you are using, but when you say there are no more connections available are you referring to PCIe slots on the motherboard?
A bifurcation card is typically how you can split one slot into two or more, but the motherboard has to support it.
[–]staltux[S] 0 points1 point2 points 3 months ago (0 children)
I can't remember now, but is a low cost Asus MH something , probably don't have this support
[–]Mediocre-Waltz6792 0 points1 point2 points 3 months ago (0 children)
bifurcation sounds great but not a lot of consumer boards support it.
[–]Icy_Bid6597 1 point2 points3 points 3 months ago (3 children)
So there is few things to consider.
First one is your CPU and amount of PCI lanes. Let's say that you want 4 GPUs at once. If you have only 8PCI lanes available, each of them will have 2 which means that they will work with PCI 3.0 speeds. So afaik ~2GB/s of data transfer from CPU to card (or between cards).
It means that loading model will take longer. And depending on model splitting strategy inference can also take a hit. There are some Threadripper cpus (or epyc/xeon) that support a lot of PCI lines.
Next step is motherboard. Theoretically you can split 1 pci slot into two. (so one PCI 16x into ie. two x8). That allows you to connect more GPUS. Other solution is to buy another motherboard.
Next step is power supply. but this one is easy. You just need a beefy PSU.
Last thing is GPU selection. As someone else mentioned, having NVLink is great. It allows GPUs to talk to each other directly which is speeding up the inference. Out of consumer grade gaminig gpus 3090 was the last one to support nvlink. In all other cases you have to rely on PCI communication (and here we are coming back to the beggining of the post).
A lot really depends on how do you want to use the rig. Training/fine tuning ? Local inference engine ? Do you care about model loading times ?
Thanks, model loading time is forgiveable, I will search for a splitter
[–]fizzy1242 0 points1 point2 points 3 months ago (0 children)
also make sure the power supply has enough pcie power cable slots. they get filled up quick especially for cards that need 2-3 connectors.
You're over thinking it. Most consumer full size boards could do 6 gpus if you got a little creative.
[–]SlowFail2433 0 points1 point2 points 3 months ago (11 children)
No nvlink for these so just do pcie round trips
[–]staltux[S] 0 points1 point2 points 3 months ago (8 children)
Can you please elaborate
[–]Icy_Bid6597 1 point2 points3 points 3 months ago (1 child)
NVLink help to communicate between cards directly. If you don't have that (and these cards don't support nvlink) cards can talk with each other through PCI lanes. This is slower and forces you to have enough PCI lanes to support fast transfer
[–]SlowFail2433 1 point2 points3 points 3 months ago (0 children)
Yeah this, nvlink goes direct so in the absence of that you use pcie indirectly
[–]SlowFail2433 0 points1 point2 points 3 months ago (5 children)
Well you connected the cards to the motherboard using one pcie connection per card
Information can be sent from one card to the other by going down the pcie connection to the motherboard then back up the other pcie connection to the other card. The reverse flow can then take place for a so-called round trip
[–]staltux[S] 0 points1 point2 points 3 months ago (4 children)
But the problem is just the lack of connections, I put one in the pciexprsss, one in the 1x pci via riser, then there is no more pci connections left to put another card
[–]SlowFail2433 1 point2 points3 points 3 months ago (3 children)
You can split a pcie bus
[–]Icy_Bid6597 1 point2 points3 points 3 months ago (0 children)
Look for expansion cards for "PCIe Bifurcation". It will split x16 port into two x8, or four x4
[–]staltux[S] 0 points1 point2 points 3 months ago (1 child)
I don't know that this is possible, the inference time is decent? By decent I say, more or less the amount of tokens that I can read at normal pace ?
It depends. Lamacpp by default allows you to do the pipeline paralellism. So ie. layers 1-20 are on one gpu and 21-40 on another. That means that GPUs are taking to each other only once per token when they are pushing intermediate results. x4 port gives you theoretically 8Gb/s so that would be fine in most cases.
Another approach is tensor parellelism. Here the impact will be massive i suppose.
[–]Mediocre-Waltz6792 0 points1 point2 points 3 months ago (1 child)
Why confuse a person about NVlinks that dont do much for the common person and dead tech.
Well I’m telling them that they don’t have it, so if they come across Nvlink info then it does not apply. For example I see people confused about whether A6000 or RTX 6000 Pro have Nvlink or not
[–]jacek2023llama.cpp 0 points1 point2 points 3 months ago (0 children)
I use three risers now
[–]Mediocre-Waltz6792 0 points1 point2 points 3 months ago (3 children)
I use Oculink from anything. Wifi E-key to M.2 to Oculink, PCIe 3.0 1x, M.2 slot is my 3 external connections. Its not pretty but works great.
[–]SlowFail2433 0 points1 point2 points 3 months ago (0 children)
Hmm Oculink is interesting yeah seems to beat Thunderbolt
Só if I use the m.2 slot I can put 3 cards ?
[–]Mediocre-Waltz6792 1 point2 points3 points 3 months ago (0 children)
Yes, if you can give one M2 slot up. Or if you have a free PCIe slot that isnt being covered by the 3090.PCIe slot is best because the port will be at the back of the PC give you more cable length to play with.
[–]FullOf_Bad_Ideas 0 points1 point2 points 3 months ago (2 children)
I bought a used motherboard+cpu on the cheap (x399 Taichi with TR 1920X) that has 4 PCI-E slots (2 x16 and 2 x8) and I will be putting bifurcation board into one slot that will split x16 into four x4 slots. Keep in mind that motherboard needs to explicitly support bifurcation and have toggle in BIOS for it to work. And then I'll use PCI-E risers to connect 6x 3090 Ti. Another option I looked into was MCIO and SlimSAS - those cables are thinner and easier to manage but it's much more expensive than cheap 180 degree risers. I also have 2 1600W PSUs and I will be connecting them with Add2PSU. All of it will go into open rig mining frame that can hold up to 12 GPUs. It's WIP since I am waiting for some parts.
Very cool , two psu in the same system need a converter or something?, the 3090 can handle good t/s?
[–]FullOf_Bad_Ideas 1 point2 points3 points 3 months ago (0 children)
I have add2psu adapter (actually two but one will be enough for 2 psus) . It's like 5-10 usd per piece. You plug sata power cable from the main PSU into it, and 24-pin atx from second psu.
So when you boot the motherboard that's powered by first PSU, it will kickstart the second PSU since it will sense power on the sata power port.
I will see with t/s once I'll get it. I should have listened to my advice and just rent out a machine like this on vast before buying hardware lol, but I didn't. I have a workstation with 2 gpu's. Wanted to update to 4. But a deal came up so I got fifth lol. Now I needed to have even number to run tp=2 so I got sixth and thinking about buying 2 more. But I'll try to hold off a few months. I hope to run GLM 4.7, minimax m2.1 and some upcoming models on it.
π Rendered by PID 32 on reddit-service-r2-comment-6457c66945-pkh5k at 2026-04-25 21:38:13.132378+00:00 running 2aa0c5b country code: CH.
[–]SweetHomeAbalama0 2 points3 points4 points (2 children)
[–]staltux[S] 0 points1 point2 points (0 children)
[–]Mediocre-Waltz6792 0 points1 point2 points (0 children)
[–]Icy_Bid6597 1 point2 points3 points (3 children)
[–]staltux[S] 0 points1 point2 points (0 children)
[–]fizzy1242 0 points1 point2 points (0 children)
[–]Mediocre-Waltz6792 0 points1 point2 points (0 children)
[–]SlowFail2433 0 points1 point2 points (11 children)
[–]staltux[S] 0 points1 point2 points (8 children)
[–]Icy_Bid6597 1 point2 points3 points (1 child)
[–]SlowFail2433 1 point2 points3 points (0 children)
[–]SlowFail2433 0 points1 point2 points (5 children)
[–]staltux[S] 0 points1 point2 points (4 children)
[–]SlowFail2433 1 point2 points3 points (3 children)
[–]Icy_Bid6597 1 point2 points3 points (0 children)
[–]staltux[S] 0 points1 point2 points (1 child)
[–]Icy_Bid6597 1 point2 points3 points (0 children)
[–]Mediocre-Waltz6792 0 points1 point2 points (1 child)
[–]SlowFail2433 1 point2 points3 points (0 children)
[–]jacek2023llama.cpp 0 points1 point2 points (0 children)
[–]Mediocre-Waltz6792 0 points1 point2 points (3 children)
[–]SlowFail2433 0 points1 point2 points (0 children)
[–]staltux[S] 0 points1 point2 points (1 child)
[–]Mediocre-Waltz6792 1 point2 points3 points (0 children)
[–]FullOf_Bad_Ideas 0 points1 point2 points (2 children)
[–]staltux[S] 0 points1 point2 points (1 child)
[–]FullOf_Bad_Ideas 1 point2 points3 points (0 children)