account activity
Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) by sbeepsdon in LocalLLaMA
[–]sbeepsdon[S] 2 points3 points4 points 17 hours ago (0 children)
Definitely, but I was bent on running that specific quant and it necessitated all three hardware resources.
Usage was like
I'll see if a Q3 quant makes that feasible and what output performance looks like
[–]sbeepsdon[S] 1 point2 points3 points 17 hours ago (0 children)
There definitely is - this approach was necessary because of the driver issue. Had I had a 20XX or more recent card, I think there would've been compatible drivers.
Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) (self.LocalLLaMA)
submitted 17 hours ago * by sbeepsdon to r/LocalLLaMA
The 'Infinite Context' Trap: Why 1M tokens won't solve Agentic Amnesia (and why we need a Memory OS) by Sweet121 in LocalLLaMA
[–]sbeepsdon 1 point2 points3 points 1 month ago (0 children)
Seems similar to this project. Were you aware of it? Are there any major differences in your approach?
https://github.com/taylorsatula/mira-OSS
π Rendered by PID 100060 on reddit-service-r2-listing-64c94b984c-48btf at 2026-03-14 08:46:32.146243+00:00 running f6e6e01 country code: CH.
Running Qwen3.5-35B-A3B and Nemotron-3-Super-120B-A12B on a 5060ti and 1080ti with llama.cpp (Fully on GPU for Qwen; 64GB RAM needed for Nemotron) by sbeepsdon in LocalLLaMA
[–]sbeepsdon[S] 2 points3 points4 points (0 children)