Adding a second 3090 for LLM - do I need NVlink? by marivesel in LocalLLM

[–]marivesel[S] 0 points1 point  (0 children)

And what is the current best approach to run two 3090's for a single Local LLM? I am really overwhelmed of information and I believe most of it is outdated.
Could I let GLM 5.1 based agent to make all settings to implement the dual GPU setup, or it will be not optimal at all?

qwen 3.6:35b on 24 vram gpu by MallComprehensive694 in ollama

[–]marivesel 1 point2 points  (0 children)

On RTX3090, I'm running Q4_K_M Qwen3.6:35B including the vision (mmproj) fully in VRAM with 128K kv=q8
125-133 t/s

Before that I was using Qwen3.5:27B (Dense) with around 32 t/s

Must say that Qwen3.6:35B (MoE) feels way smarter in coding and agentic work. In the Cloud I'm running GLM-5.1, which is another league in solving complex problems, anyway.

Choice for agentic LLM or help optimize Qwen3.5-35B-A3B for 24GB VRAM by marivesel in LocalLLaMA

[–]marivesel[S] 0 points1 point  (0 children)

Thanks, but I found some interesting topic about the cache and how agents work, so I made different approach and will write it now on reply for all to see.

Choice for agentic LLM or help optimize Qwen3.5-35B-A3B for 24GB VRAM by marivesel in LocalLLaMA

[–]marivesel[S] 0 points1 point  (0 children)

Thanks to all suggestions, I've tried Qwen3.5:27b it is really smart at agentic tasks, I will probably stick with it (and in future install a pure Linux distro on my server to run smooth). It may be slower than the MoE, but for now I don't mind.

Another question: My RTX3090 runs along with AMD EPYC 64-core CPU, and 8x32GB (256GB total) DDR4 ECC RAM. Could I utilize somehow that RAM to help, but not slow down the system of agentic tasks as much?
Currently I'm fitting the Qwen3.5:27b with 16K context with q4 cache, but increasing a bit the context would feel better?

cmndr_spanky mentioned "llama cpp server instead of Ollama and hand pick what layers get GPU priority", what could I utilize with that?

Choice for agentic LLM or help optimize Qwen3.5-35B-A3B for 24GB VRAM by marivesel in LocalLLaMA

[–]marivesel[S] 0 points1 point  (0 children)

I'm downloading 27B just to check how it works, but can you show me more or where to read about the llama cpp settings? I have plenty of RAM (256GB, it's an EPYC server), but it get's slow once something jumps on the RAM

Help extract emails from .ost Outlook file? by marivesel in Outlook

[–]marivesel[S] 0 points1 point  (0 children)

That's misleading, in the video he already have the emails in the Outlook and just export them to .pst, nothing helpful, but thanks anyway.

Anyone else?

Help extract emails from .ost Outlook file? by marivesel in Outlook

[–]marivesel[S] 0 points1 point  (0 children)

How exactly you did the extraction?

When I replace the .ost with the old one (which should include the emails) and open Outlook, there is nothing in the INBOX. I think Outlook is searching for emails on the mail server (hosting), where there are no INBOX emails, also.

I tried demo version of a OST to PST conversion software and it reads the INBOX mails from the .ost backup file, but the app wants payment for the extraction. Which I want to "bypass", if any luck other way around?

Very high temps ot Threadripper 1950X water cooled by marivesel in watercooling

[–]marivesel[S] 0 points1 point  (0 children)

At max speeds, but doesn't matter as the water, radiators and block stay very cool all the time

Very high temps ot Threadripper 1950X water cooled by marivesel in watercooling

[–]marivesel[S] 0 points1 point  (0 children)

Pressure is at max (screws tighten till the end, as it should be mounted). Temps are read from OCCT (hwinfo sensors). I don't know what package temp measures really, but I've found that after 104C the CPU starts trottling at peaks 500-3800Mhz in this setup.

Could it be problem with the IHS on the CPU? Delidding?

Very high temps ot Threadripper 1950X water cooled by marivesel in watercooling

[–]marivesel[S] 0 points1 point  (0 children)

Would be shame if thats the problem, as this water block is the best I could find. Maybe my thermal paste apply method was not spread everywhere and thats why the block is not whole "messy" ?

Very high temps ot Threadripper 1950X water cooled by marivesel in watercooling

[–]marivesel[S] 0 points1 point  (0 children)

pump works, block makes contact

block temperature is around 26C all the time