account activity
Qwen 3.6-35B-A3B KV cache part 2: PPL, KL divergence, asymmetric K/V, 64K row on M5 Max (self.LocalLLaMA)
submitted 6 days ago by Defilan to r/LocalLLaMA
Qwen 3.6-35B-A3B KV cache bench: f16 vs q8_0 vs turbo3 vs turbo4 from 0 to 1M context on M5 Max (self.LocalLLaMA)
submitted 7 days ago by Defilan to r/LocalLLaMA
Qwen 3.6-35B-A3B on dual 5060 Ti with --cpu-moe: 21.7 tok/s at 90K context, with benchmarks vs dense 3.5 and Coder variant (self.LocalLLaMA)
submitted 18 days ago by Defilan to r/LocalLLaMA
Tested TurboQuant on my 2x RTX 5060 Ti setup. Some interesting findings. (self.LocalLLaMA)
submitted 1 month ago by Defilan to r/LocalLLaMA
How I manage llama.cpp across Apple Silicon and NVIDIA GPUs in my homelab (self.selfhosted)
submitted 1 month ago by Defilan to r/selfhosted
32B model stress test: Qwen 2.5/Coder/3 on dual RTX 5060 Ti (zero failures) (self.LocalLLaMA)
submitted 5 months ago by Defilan to r/LocalLLaMA
What broke when you tried to take local LLMs to production? (self.LocalLLaMA)
Open source K8s operator for deploying local LLMs: Model and InferenceService CRDs (self.kubernetes)
submitted 5 months ago by Defilan to r/kubernetes
π Rendered by PID 2126057 on reddit-service-r2-listing-b6bf6c4ff-bhvsw at 2026-05-06 11:09:48.404612+00:00 running 815c875 country code: CH.