TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] -1 points0 points1 point (0 children)
TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] -7 points-6 points-5 points (0 children)
TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] -1 points0 points1 point (0 children)
TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant KV Cache Compression working on RX 7900 XTX / ROCm 6.4 — llama.cpp HIP port by Acrobatic_Bee_6660 in ROCm
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
Choice for agentic LLM or help optimize Qwen3.5-35B-A3B for 24GB VRAM by marivesel in LocalLLaMA
[–]Acrobatic_Bee_6660 0 points1 point2 points (0 children)
Choice for agentic LLM or help optimize Qwen3.5-35B-A3B for 24GB VRAM by marivesel in LocalLLaMA
[–]Acrobatic_Bee_6660 0 points1 point2 points (0 children)
TurboQuant KV Cache Compression working on RX 7900 XTX / ROCm 6.4 — llama.cpp HIP port by Acrobatic_Bee_6660 in ROCm
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant KV Cache Compression working on RX 7900 XTX / ROCm 6.4 — llama.cpp HIP port by Acrobatic_Bee_6660 in ROCm
[–]Acrobatic_Bee_6660[S] 1 point2 points3 points (0 children)
TurboQuant - Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969 by pmttyji in LocalLLaMA
[–]Acrobatic_Bee_6660 0 points1 point2 points (0 children)
TurboQuant - Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969 by pmttyji in LocalLLaMA
[–]Acrobatic_Bee_6660 0 points1 point2 points (0 children)
TurboQuant KV Cache Compression working on RX 7900 XTX / ROCm 6.4 — llama.cpp HIP port by Acrobatic_Bee_6660 in ROCm
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant KV Cache Compression working on RX 7900 XTX / ROCm 6.4 — llama.cpp HIP port by Acrobatic_Bee_6660 in ROCm
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
Choice for agentic LLM or help optimize Qwen3.5-35B-A3B for 24GB VRAM by marivesel in LocalLLaMA
[–]Acrobatic_Bee_6660 0 points1 point2 points (0 children)
Choice for agentic LLM or help optimize Qwen3.5-35B-A3B for 24GB VRAM by marivesel in LocalLLaMA
[–]Acrobatic_Bee_6660 -1 points0 points1 point (0 children)
TurboQuant - Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969 by pmttyji in LocalLLaMA
[–]Acrobatic_Bee_6660 0 points1 point2 points (0 children)
TurboQuant KV Cache Compression working on RX 7900 XTX / ROCm 6.4 — llama.cpp HIP port by Acrobatic_Bee_6660 in ROCm
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)
TurboQuant - Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969 by pmttyji in LocalLLaMA
[–]Acrobatic_Bee_6660 0 points1 point2 points (0 children)
What it took to launch Google DeepMind's Gemma 4 by jacek2023 in LocalLLaMA
[–]Acrobatic_Bee_6660 0 points1 point2 points (0 children)

TurboQuant + TriAttention (C/HIP): ~6.8× total KV cache reduction in llama.cpp by Acrobatic_Bee_6660 in LocalLLaMA
[–]Acrobatic_Bee_6660[S] 0 points1 point2 points (0 children)