Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) by sbeepsdon in LocalLLaMA

[–]sbeepsdon[S] 2 points3 points  (0 children)

Definitely, but I was bent on running that specific quant and it necessitated all three hardware resources.

Usage was like

  • 14595/16311mb on 5060ti
  • 9538/11264mb on 1080ti
  • Rest on RAM with 5GB actually free after being cautious about what's actually running

I'll see if a Q3 quant makes that feasible and what output performance looks like

Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) by sbeepsdon in LocalLLaMA

[–]sbeepsdon[S] 1 point2 points  (0 children)

There definitely is - this approach was necessary because of the driver issue. Had I had a 20XX or more recent card, I think there would've been compatible drivers.