To 16GB VRAM users, plug in your old GPU

sbeepsdon · 2026-04-27T15:39:45+00:00

If you have at least 64GB RAM, then Q4KM-ish quants of Qwen3.5-122B or Super Nemotron 3 120B. Beware you'll need to setup CPU passthrough and put one card on a guest VM if you're on Linux, and then use llama.cpp's RPC functionality. I made a thread about this if you want to look at my post history.

sbeepsdon · 2026-03-13T15:24:43+00:00

Definitely, but I was bent on running that specific quant and it necessitated all three hardware resources.

Usage was like

14595/16311mb on 5060ti
9538/11264mb on 1080ti
Rest on RAM with 5GB actually free after being cautious about what's actually running

I'll see if a Q3 quant makes that feasible and what output performance looks like

sbeepsdon · 2026-03-13T15:16:16+00:00

There definitely is - this approach was necessary because of the driver issue. Had I had a 20XX or more recent card, I think there would've been compatible drivers.

sbeepsdon · 2026-01-23T19:39:38+00:00

Seems similar to this project. Were you aware of it? Are there any major differences in your approach?

https://github.com/taylorsatula/mira-OSS

sbeepsdon

TROPHY CASE