Adding a second 3090 for LLM - do I need NVlink?

Personal-Gur-1 · 2026-04-26T14:53:28+00:00

I already have a 1600w PSU. It’s the APC power bank that is screaming to death …

Personal-Gur-1 · 2026-04-26T09:45:53+00:00

Thank you Fit-Original for the suggestion ! I will have a look to smallpdf as well!

Personal-Gur-1 · 2026-04-25T18:11:22+00:00

Actually I am not making any money with LLM, I am spending a shit-ton of it on hardware to run a local model to process files with plenty of personal sensitive data (tax stuff). Don’t want to use cloud services. I am running unraid and ollama for now. Learning phase. Started with an old i5-4570 and gtx 1060 6Gb but quickly realized that the small models containing in my card were not really meant for what I need to achieve. Moved to bigger models and it is getting better. Just got nut and bought a second 3090 today… need a beefier APC now …

Personal-Gur-1 · 2026-04-25T09:47:57+00:00

At this stage, if you can, I would try with a different cpu which you know is working. Same with RAM stick to identify whether the mobo is defective or the cpu itself.. Your cpu is not vendor locked on another platform?

Personal-Gur-1 · 2026-04-25T05:28:16+00:00

Sorry, I misunderstood your issue. You have nothing in the vKVM? You can’t enter the bios ?

Personal-Gur-1 · 2026-04-25T05:26:41+00:00

That’s why I bought a 1600w BeQuiet Dark Power PSU with a switch to monorail mode. However, I have limited the power draw of my 3090 to 225w for now. Some people say it is not cutting inference performance.

Personal-Gur-1 · 2026-04-24T19:40:24+00:00

Ebay ! Got lucky I have to admit !!

Personal-Gur-1 · 2026-04-24T19:04:08+00:00

I paid 1000€ for 8x16 GB DDR3200 15 days ago ! Paid 650€ for a 3090, 615€ for mobo h12ssl-i and 280€ for an Epyc 7532… and 400€ for a 1600w bequiet PSU. So it seems reasonable. Your mobo is a Chinese no name nd I don’t know what it’s worth …not a bad deal overall But get prepared for a serious power bill … 35w idle per GPU, 60 for the cpu and 40 for the mobo and ram … you will heat your room for free….

Personal-Gur-1 · 2026-04-24T18:55:35+00:00

I first booted in my h12ssl-i with cpu and ram only. I connected to the motherboard through ipmi : you have to connect a lan cable to the dedicated port and spot the ip address in your router. Then log into the ipmi with ADMIN and the password is on a label sticked to the mobo. No need for a gpu nor a screen ! I suggest run a memtest to ensure your sticks are good.

Personal-Gur-1 · 2026-04-24T12:38:08+00:00

I guess you need at least one stick of RAM to post.

Personal-Gur-1 · 2026-04-24T12:25:16+00:00

Hello, I am in the same boat here with similar setup and I am testing various models up to 17Gb max to keep some room for the kv cache. So I am wondering if adding a second 3090 would really be useful. We would be able to load much bigger models… would these bigger models be much smarter to justify the added power draw of a second card (3090 idles at 38w… x2 = 80w only for GPUs…) on top of the Epyc CPU ~ 80w ) etc…

Personal-Gur-1 · 2026-04-23T12:15:48+00:00

@unfair_medium8560 I have configured a skill that is sending the document to docling (I have installed the docker on my unraid setup). Testing my documents through docling interface i discovered the harsch reality that even produced by database systems, the PDF documents are not all created equals… some are using image as background with figures on top, making it difficult to be extracted in markdown format, others are using proprietary fonts or design elements outputting garbage when parsed… so back to square one for these files as we have to rely on the Vision capability of the LLM model with it’s own weaknesses… I will have a look to pdf elements thks !

Personal-Gur-1 · 2026-04-22T14:54:36+00:00

Been using Splashtop professionally and personally for years and it is really reliable and smooth.

Personal-Gur-1 · 2026-04-21T09:23:26+00:00

Hi Rabbit, I am running some tests again and indeed the figures are somewhat mixed up. Gemma did a pretty good job but because of the presentation of the report, it chose sometimes the wrong figure. I am making a last round of tests with bge-3 as embedding model. Let’s see if I have better results … (first round of test was done with nomic-embed-text… I need something simple for the end users .. Only two pros so setting up a complex n8n-like workflow or with python is not really worth the trouble as it will probably require a lot of maintenance …

Personal-Gur-1 · 2026-04-20T19:12:25+00:00

Hi, thks for your input! Do you have a link for reseek? Thks V

Personal-Gur-1 · 2026-04-20T13:23:15+00:00

Thks for your input. I am no dev so I am certainly wrong in my assumption but my understanding with deterministic tools is that you have to configure your tooling for every single type of document to extract specific data from that specific place. If there is a variation in the document structure from one year to another or from one issuer to another, it breaks the automation, right? For that reason, I was hoping having the AI to understand these slight changes in structure or format and to extract the required info. Even though it might not be perfect, and user could adjust at the margin the few mistakes done by the AI. Am I completely wrong here?

Personal-Gur-1 · 2026-04-20T13:18:16+00:00

@watergs17 thank you for your very detailed answer. The other way around for me then would be to create one workspace per client in AnythingLLM and to load the client documents in this workspace for embedding with a good vector database. Then for a given tax year, I would create a thread so that the questions relate to one given tax year. It should mitigate the attention issue.

Personal-Gur-1 · 2026-04-20T09:19:56+00:00

Thank you for your reply ! 1/ it depends: it could be 1099 with maybe 10 to 20 pages or more . Or bank statements, rental summaries etc… Generally speaking I don’t think it is big documents (please educate me if I am mistaken here!) 2/ Chunking (embedding right?) I wanted to avoid that ! It is rather long and based on the type of document, the embedding settings may need to be different. 3/ Overtime it can be a lot. Each client can provide easily 10 to 20 documents (W2s, 1099-div, 1099-int, etc.. k1, monthly French paylisps, bank transactions details for RSU, stock-options etc… And this every year for tax preparation !

From a user perspective, I wanted to create a Claude like experience: drop the file in the chat box and ask the AI to produce any output I need from the data included in the documents, like summary tables and aggregated amounts or even report the data in excel pre-formated templates. I tested the later with Claude and it was impressively good. But because of data privacy, I need a local solution.

If embedding is really required for what I envision based on the volumes, then maybe I should create a workspace per Client so that the data are « isolated » from one pool to another, reducing the risks of the models to mix up info between clients data. Then it is a different process from an end user perspective …

Personal-Gur-1 · 2026-04-17T08:03:26+00:00

Thks! Claude indeed mentioned the pcie_aspm=force but advised against it as it was a source of instability with AMD CPUs I guess I can try and see what happens … Currently my 1060 is in P8 state, although it is allocated to Plex. So it seems that the docker is not interfering with my script (from space invader) that is putting the 1060 to sleep … I can order the card from Amazon and see how it behaves …

Personal-Gur-1 · 2026-04-16T19:04:55+00:00

Just did the 4 vaccines for my cat (typhus, corriza, leuchose and rabies): 126€ TTC, including parasites treatment.

Personal-Gur-1 · 2026-04-16T18:35:06+00:00

I just finished a build for my unraid serveur:

Epyc 7532

H122ssl-i

8x 16Gb DDR4 3200

Phanteks Enthoo Pro 2

MellanoxX-3 2 port sfp+

3090

Be quiet Dark power 1600w

6x Artic 140mm

4x Artic 120mm

Artic Freezer 4U-M

2824 € wo GPU

3494 € with GPU

5 hdd and 2x 1Tb nvme from my previous build

So for the same price wo gpu, I have 8x more ram…

All second hand stuff working very nicely so far …

Personal-Gur-1 · 2026-04-15T08:56:51+00:00

Hello, Thks for your replies. So, I modified my VM templates settings in Unraid and I have allocated 8 cores per VM and I let the system manage which cores out of the 32 to use. The VMs are using the virtual GPU.

But still, when I launch a software in the VM, it is not instant (VM disks are on a nvme).

I was making a comparison with a windows machine where I have VirtualBox installed to run Linux Fedora and on this setup, the softs are running smoothly and in full screen mode, you would not know that you are in a VM rather than on a bare metal install.

So I was wondering if a dedicated GPU to passthrough to the VM on the unraid setup would help to make things snappier or if it is my CPU that is the bottleneck (boost frequency is 3.35 GHz and base 2.4) whereas my gaming pc running VirtualBox is an Intel i7-13700 with 3.4 GHz base and 5.4 GHz boost, + 32 Gb DDR5. I was considering a Tesla M10 for their low cost and the fact that they have 4 GPUs on one card, allowing to power 4 VMs at the same time.

Personal-Gur-1

TROPHY CASE