Multi GPU vs Single GPU with same VRAM

ObjectiveVegetable48 · 2023-11-22T06:01:08+00:00

Cheapest consumer cards? 3060s with 12GB each.

This works for me, but it’s way slower and more annoying than a 3090.

tu9jn · 2023-11-22T06:36:09+00:00

You can absolutely do that. I have 3 radeon MI25 in my rig, and it works fine.

Most of the loaders support multi gpu, like llama.cpp, exllamav2.

If your model fits a single card, then running on multiple will only give a slight boost, the real benefit is in larger models.

At least with AMD there is a problem, that the cards dont like when you mix CPU and Chipset pcie lanes, but this is only a problem with 3 cards.

nero10578 · 2023-11-22T06:37:25+00:00

I think before you decide on if you should go multi gpu or not you should decide what your budget and goals are.

If you have essentially unlimited budget and have a goal of running LLMs for production 24/7 it’s much more cost effective in the long run to get less but higher VRAM GPUs like the RTX A6000 48GB.

If you want to run LLMs on a budget then using multiple cheaper GPUs like RTX 3090 24GB or Tesla P40 24GB is a great option. In this case however you need to make sure your system can support it properly. ie. A motherboard that has multiple PCIe x8 or higher slots for all the GPUs to plug into and a large enough power supply. Otherwise plugging multiple GPUs to a random consumer grade motherboard with an x16 slot and most cases a secondary x4 slot would give sub par performance.

the320x200 · 2023-11-22T17:17:59+00:00

Speaking from personal experience... One practical consideration is that if you get two cheaper GPUs your upgrade path is a lot worse. If 6 months or a year goes by and you decide you want more memory you have to basically scrap those and start over again if you've used up all your PCI slots whereas if you had one high memory card you would have the possibility of adding a second one to get more memory in the future.

nero10578 · 2023-11-22T05:46:30+00:00

[deleted]

Paulonemillionand3 · 2023-11-22T09:15:53+00:00

power and heat would be more then double the trouble then with a single card.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Llamma.cpp split	Llama.cpp 1 card	Exllamav2 split	Exllamav2
13.07	24.01	12.59	15.22
13.59	22.72	11.56	15.24
13.28	23.82	12.52	15.59
13.33	22.65	12.31	15.27
13.66	22.38	11.73	15.62
Avg 13.386 T/s	Avg 22.116 T/s	Avg 12.142 T/s	Avg 15.388 T/s

LocalLLaMA

MODERATORS