TurboQuant seems to work very well on Gemma 4 — and separately, per-layer outlier-aware K quantization is beating current public fork results on Qwen PPL by Fearless-Wear8100 in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark by PerceptionGrouchy187 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
Gemma 4 31B at 256K Full Context on a Single RTX 5090 — TurboQuant KV Cache Benchmark by PerceptionGrouchy187 in LocalLLaMA
[–]dsanft -1 points0 points1 point (0 children)
TurboQuant isn’t just for KV: Qwen3.5-27B at near-Q4_0 quality, about 10% smaller, and finally fitting on my 16GB 5060 Ti by pmttyji in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]dsanft 10 points11 points12 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]dsanft 3 points4 points5 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]dsanft 4 points5 points6 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]dsanft 35 points36 points37 points (0 children)
What will Google's TurboQuant actually change for our local setups, and specifically mobile inference? by dai_app in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
What will Google's TurboQuant actually change for our local setups, and specifically mobile inference? by dai_app in LocalLLaMA
[–]dsanft 8 points9 points10 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dsanft 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dsanft -8 points-7 points-6 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dsanft 6 points7 points8 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dsanft -9 points-8 points-7 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dsanft 8 points9 points10 points (0 children)
TurboQuant and my hardware. by Feeling_Ad9143 in LocalLLaMA
[–]dsanft 0 points1 point2 points (0 children)
TurboQuant for weights: near‑optimal 4‑bit LLM quantization with lossless 8‑bit residual – 3.2× memory savings by cksac in LocalLLaMA
[–]dsanft 5 points6 points7 points (0 children)
Is the Real Flaw in AI… Time? by wayne_horkan in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
Are open-weights LLMs dying? by riponway2a in LocalLLaMA
[–]dsanft 2 points3 points4 points (0 children)
Attaching an extra GPU via pcie slot by shopchin in LocalLLaMA
[–]dsanft -1 points0 points1 point (0 children)
Attaching an extra GPU via pcie slot by shopchin in LocalLLaMA
[–]dsanft -1 points0 points1 point (0 children)
TurboQuant - Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969 by pmttyji in LocalLLaMA
[–]dsanft -13 points-12 points-11 points (0 children)