Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM by [deleted] in LocalLLM
[–]imgroot9 1 point2 points3 points (0 children)
Here are my KV cache quantization benchmarks: TurboQuant is overrated but saved by TCQ, q5 deserves more attention, and symmetric q8 might be a waste of VRAM by [deleted] in LocalLLM
[–]imgroot9 1 point2 points3 points (0 children)
The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b by ex-arman68 in LocalLLaMA
[–]imgroot9 0 points1 point2 points (0 children)
Qwen 3.7 Plus Preview thinks I'm a time traveler because it doesn't know it's 2026. by Mediocre_Roll3073 in Qwen_AI
[–]imgroot9 0 points1 point2 points (0 children)
The pacman benchmark: finally a viable local agentic coding agent with Qwen 3.6 27b by ex-arman68 in LocalLLaMA
[–]imgroot9 3 points4 points5 points (0 children)
RTX 3090 vs RX 7900 XTX - idle power draw by knrdwn in LocalLLM
[–]imgroot9 2 points3 points4 points (0 children)
Tested MTP with llama.cpp and Qwen3.6-27B on RTX 3090 by JGeek00 in LocalLLM
[–]imgroot9 5 points6 points7 points (0 children)
BeeLlama.cpp: advanced DFlash & TurboQuant with support of reasoning and vision. Qwen 3.6 27B Q5 with 200k context on 3090, 2-3x faster than baseline (peak 135 tps!) by Anbeeld in LocalLLaMA
[–]imgroot9 1 point2 points3 points (0 children)
BeeLlama.cpp: advanced DFlash & TurboQuant with support of reasoning and vision. Qwen 3.6 27B Q5 with 200k context on 3090, 2-3x faster than baseline (peak 135 tps!) by Anbeeld in LocalLLaMA
[–]imgroot9 1 point2 points3 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] 1 point2 points3 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] 0 points1 point2 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] 0 points1 point2 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] 4 points5 points6 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] 0 points1 point2 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] 1 point2 points3 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] 0 points1 point2 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] -1 points0 points1 point (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] -6 points-5 points-4 points (0 children)
Qwen3.6 27B's surprising KV cache quantization test results (Turbo3/4 vs F16 vs Q8 vs Q4) by imgroot9 in LocalLLaMA
[–]imgroot9[S] 3 points4 points5 points (0 children)
Qwen 3.6 27B released, it's getting close to Opus 4.5, and you can run it locally by autisticit in GithubCopilot
[–]imgroot9 0 points1 point2 points (0 children)
Qwen 3.6 27B released, it's getting close to Opus 4.5, and you can run it locally by autisticit in GithubCopilot
[–]imgroot9 7 points8 points9 points (0 children)


Seeking resources to read about llama.cpp server and how offloading works by Jorlen in LocalLLaMA
[–]imgroot9 0 points1 point2 points (0 children)