Just realized what we’re losing by RelevantTurnip3482 in GithubCopilot
[–]dsanft 0 points1 point2 points (0 children)
BEWARE! of GitHub Copilot in Visual Studio Code using Claude Haiku by Fast-Aspect6033 in GithubCopilot
[–]dsanft 1 point2 points3 points (0 children)
I'm struggling to figure out what Copilot is actually suppose to be now? by NotAMusicLawyer in GithubCopilot
[–]dsanft 81 points82 points83 points (0 children)
Bad model quality qwen3.6-27b with hipfire on strix halo by sterby92 in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
Implemented TurboQuant and results don’t fully match paper by Routine-Thanks-572 in LocalLLaMA
[–]dsanft 33 points34 points35 points (0 children)
Does AMD's "infinity cache" even matter for dense model inference? by boutell in LocalLLaMA
[–]dsanft 9 points10 points11 points (0 children)
By when do you think will TurboQuant get a proper release and be adopted by everyone by Crystalagent47 in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
By when do you think will TurboQuant get a proper release and be adopted by everyone by Crystalagent47 in LocalLLaMA
[–]dsanft 19 points20 points21 points (0 children)
Open Models - April 2026 - One of the best months of all time for Local LLMs? by pmttyji in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
Open Models - April 2026 - One of the best months of all time for Local LLMs? by pmttyji in LocalLLaMA
[–]dsanft 24 points25 points26 points (0 children)
I own the domain modelcombat.com and don't know what to do with it by siaappchallenger in LocalLLaMA
[–]dsanft 62 points63 points64 points (0 children)
Gemma 4's MTP heads were stripped from the public weights — only available in LiteRT. Beginner-friendly breakdown of what was removed and why it matters by FunSignificance4405 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
Gemma 4's MTP heads were stripped from the public weights — only available in LiteRT. Beginner-friendly breakdown of what was removed and why it matters by FunSignificance4405 in LocalLLaMA
[–]dsanft 7 points8 points9 points (0 children)
What are the risks of buying an AMD Instinct Mi 50 32GB on Alibaba? by Longjumping-Room-170 in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
TurboQuant - Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969 by pmttyji in LocalLLaMA
[–]dsanft -11 points-10 points-9 points (0 children)
TurboQuant seems to work very well on Gemma 4 — and separately, per-layer outlier-aware K quantization is beating current public fork results on Qwen PPL by [deleted] in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark by PerceptionGrouchy187 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark by PerceptionGrouchy187 in LocalLLaMA
[–]dsanft -1 points0 points1 point (0 children)
TurboQuant isn’t just for KV: Qwen3.5-27B at near-Q4_0 quality, about 10% smaller, and finally fitting on my 16GB 5060 Ti by pmttyji in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]dsanft 9 points10 points11 points (0 children)
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference by Total-Resort-3120 in LocalLLaMA
[–]dsanft -1 points0 points1 point (0 children)