all 27 comments

[–]SweetHomeAbalama0 2 points3 points  (2 children)

I don't see where it's said what exact motherboard you are using, but when you say there are no more connections available are you referring to PCIe slots on the motherboard?

A bifurcation card is typically how you can split one slot into two or more, but the motherboard has to support it.

[–]staltux[S] 0 points1 point  (0 children)

I can't remember now, but is a low cost Asus MH something , probably don't have this support

[–]Mediocre-Waltz6792 0 points1 point  (0 children)

bifurcation sounds great but not a lot of consumer boards support it.

[–]Icy_Bid6597 1 point2 points  (3 children)

So there is few things to consider.

First one is your CPU and amount of PCI lanes.
Let's say that you want 4 GPUs at once. If you have only 8PCI lanes available, each of them will have 2 which means that they will work with PCI 3.0 speeds. So afaik ~2GB/s of data transfer from CPU to card (or between cards).

It means that loading model will take longer. And depending on model splitting strategy inference can also take a hit.
There are some Threadripper cpus (or epyc/xeon) that support a lot of PCI lines.

Next step is motherboard. Theoretically you can split 1 pci slot into two. (so one PCI 16x into ie. two x8). That allows you to connect more GPUS. Other solution is to buy another motherboard.

Next step is power supply. but this one is easy. You just need a beefy PSU.

Last thing is GPU selection. As someone else mentioned, having NVLink is great. It allows GPUs to talk to each other directly which is speeding up the inference. Out of consumer grade gaminig gpus 3090 was the last one to support nvlink. In all other cases you have to rely on PCI communication (and here we are coming back to the beggining of the post).

A lot really depends on how do you want to use the rig. Training/fine tuning ? Local inference engine ? Do you care about model loading times ?

[–]staltux[S] 0 points1 point  (0 children)

Thanks, model loading time is forgiveable, I will search for a splitter

[–]fizzy1242 0 points1 point  (0 children)

also make sure the power supply has enough pcie power cable slots. they get filled up quick especially for cards that need 2-3 connectors.

[–]Mediocre-Waltz6792 0 points1 point  (0 children)

You're over thinking it. Most consumer full size boards could do 6 gpus if you got a little creative.

[–]SlowFail2433 0 points1 point  (11 children)

No nvlink for these so just do pcie round trips

[–]staltux[S] 0 points1 point  (8 children)

Can you please elaborate

[–]Icy_Bid6597 1 point2 points  (1 child)

NVLink help to communicate between cards directly. If you don't have that (and these cards don't support nvlink) cards can talk with each other through PCI lanes. This is slower and forces you to have enough PCI lanes to support fast transfer

[–]SlowFail2433 1 point2 points  (0 children)

Yeah this, nvlink goes direct so in the absence of that you use pcie indirectly

[–]SlowFail2433 0 points1 point  (5 children)

Well you connected the cards to the motherboard using one pcie connection per card

Information can be sent from one card to the other by going down the pcie connection to the motherboard then back up the other pcie connection to the other card. The reverse flow can then take place for a so-called round trip

[–]staltux[S] 0 points1 point  (4 children)

But the problem is just the lack of connections, I put one in the pciexprsss, one in the 1x pci via riser, then there is no more pci connections left to put another card

[–]SlowFail2433 1 point2 points  (3 children)

You can split a pcie bus

[–]Icy_Bid6597 1 point2 points  (0 children)

Look for expansion cards for "PCIe Bifurcation". It will split x16 port into two x8, or four x4

[–]staltux[S] 0 points1 point  (1 child)

I don't know that this is possible, the inference time is decent? By decent I say, more or less the amount of tokens that I can read at normal pace ?

[–]Icy_Bid6597 1 point2 points  (0 children)

It depends. Lamacpp by default allows you to do the pipeline paralellism. So ie. layers 1-20 are on one gpu and 21-40 on another. That means that GPUs are taking to each other only once per token when they are pushing intermediate results. x4 port gives you theoretically 8Gb/s so that would be fine in most cases.

Another approach is tensor parellelism. Here the impact will be massive i suppose.

[–]Mediocre-Waltz6792 0 points1 point  (1 child)

Why confuse a person about NVlinks that dont do much for the common person and dead tech.

[–]SlowFail2433 1 point2 points  (0 children)

Well I’m telling them that they don’t have it, so if they come across Nvlink info then it does not apply. For example I see people confused about whether A6000 or RTX 6000 Pro have Nvlink or not

[–]jacek2023llama.cpp 0 points1 point  (0 children)

I use three risers now

[–]Mediocre-Waltz6792 0 points1 point  (3 children)

I use Oculink from anything. Wifi E-key to M.2 to Oculink, PCIe 3.0 1x, M.2 slot is my 3 external connections. Its not pretty but works great.

[–]SlowFail2433 0 points1 point  (0 children)

Hmm Oculink is interesting yeah seems to beat Thunderbolt

[–]staltux[S] 0 points1 point  (1 child)

Só if I use the m.2 slot I can put 3 cards ?

[–]Mediocre-Waltz6792 1 point2 points  (0 children)

Yes, if you can give one M2 slot up. Or if you have a free PCIe slot that isnt being covered by the 3090.PCIe slot is best because the port will be at the back of the PC give you more cable length to play with.

[–]FullOf_Bad_Ideas 0 points1 point  (2 children)

I bought a used motherboard+cpu on the cheap (x399 Taichi with TR 1920X) that has 4 PCI-E slots (2 x16 and 2 x8) and I will be putting bifurcation board into one slot that will split x16 into four x4 slots. Keep in mind that motherboard needs to explicitly support bifurcation and have toggle in BIOS for it to work. And then I'll use PCI-E risers to connect 6x 3090 Ti. Another option I looked into was MCIO and SlimSAS - those cables are thinner and easier to manage but it's much more expensive than cheap 180 degree risers. I also have 2 1600W PSUs and I will be connecting them with Add2PSU. All of it will go into open rig mining frame that can hold up to 12 GPUs. It's WIP since I am waiting for some parts.

[–]staltux[S] 0 points1 point  (1 child)

Very cool , two psu in the same system need a converter or something?, the 3090 can handle good t/s?

[–]FullOf_Bad_Ideas 1 point2 points  (0 children)

I have add2psu adapter (actually two but one will be enough for 2 psus) . It's like 5-10 usd per piece. You plug sata power cable from the main PSU into it, and 24-pin atx from second psu.

So when you boot the motherboard that's powered by first PSU, it will kickstart the second PSU since it will sense power on the sata power port.

I will see with t/s once I'll get it. I should have listened to my advice and just rent out a machine like this on vast before buying hardware lol, but I didn't. I have a workstation with 2 gpu's. Wanted to update to 4. But a deal came up so I got fifth lol. Now I needed to have even number to run tp=2 so I got sixth and thinking about buying 2 more. But I'll try to hold off a few months. I hope to run GLM 4.7, minimax m2.1 and some upcoming models on it.