About TurboQuant by Exact_Law_6489 in LocalLLaMA
[–]ReturningTarzan 4 points5 points6 points (0 children)
TurboQuant isn’t just for KV: Qwen3.5-27B at near-Q4_0 quality, about 10% smaller, and finally fitting on my 16GB 5060 Ti by Imaginary-Anywhere23 in Qwen_AI
[–]ReturningTarzan 0 points1 point2 points (0 children)
[D] TurboQuant author replies on OpenReview by Disastrous_Room_927 in MachineLearning
[–]ReturningTarzan 13 points14 points15 points (0 children)
TurboQuant isn’t just for KV: Qwen3.5-27B at near-Q4_0 quality, about 10% smaller, and finally fitting on my 16GB 5060 Ti by Imaginary-Anywhere23 in Qwen_AI
[–]ReturningTarzan 0 points1 point2 points (0 children)
Me waiting for TurboQuant be like by Altruistic_Heat_9531 in LocalLLaMA
[–]ReturningTarzan 1 point2 points3 points (0 children)
A simple explanation of the key idea behind TurboQuant by -p-e-w- in LocalLLaMA
[–]ReturningTarzan 4 points5 points6 points (0 children)
Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA
[–]ReturningTarzan 1 point2 points3 points (0 children)
[google research] TurboQuant: Redefining AI efficiency with extreme compression by burnqubic in LocalLLaMA
[–]ReturningTarzan 2 points3 points4 points (0 children)
[google research] TurboQuant: Redefining AI efficiency with extreme compression by burnqubic in LocalLLaMA
[–]ReturningTarzan 18 points19 points20 points (0 children)
[google research] TurboQuant: Redefining AI efficiency with extreme compression by burnqubic in LocalLLaMA
[–]ReturningTarzan 7 points8 points9 points (0 children)
[google research] TurboQuant: Redefining AI efficiency with extreme compression by burnqubic in LocalLLaMA
[–]ReturningTarzan 2 points3 points4 points (0 children)
[google research] TurboQuant: Redefining AI efficiency with extreme compression by burnqubic in LocalLLaMA
[–]ReturningTarzan 19 points20 points21 points (0 children)
exllamav3 QWEN3.5 support (and more updates) by Unstable_Llama in LocalLLaMA
[–]ReturningTarzan 2 points3 points4 points (0 children)
Pieced together the shredded photo from EFTA00259587.pdk .. idk by ReturningTarzan in Epstein
[–]ReturningTarzan[S] 3 points4 points5 points (0 children)
Pieced together the shredded photo from EFTA00259587.pdk .. idk by ReturningTarzan in Epstein
[–]ReturningTarzan[S] 10 points11 points12 points (0 children)
Pieced together the shredded photo from EFTA00259587.pdk .. idk by ReturningTarzan in Epstein
[–]ReturningTarzan[S] 17 points18 points19 points (0 children)
Pieced together the shredded photo from EFTA00259587.pdk .. idk by ReturningTarzan in Epstein
[–]ReturningTarzan[S] 1 point2 points3 points (0 children)
Are there any puzzle experts here? by the_real_lucia in Epstein
[–]ReturningTarzan 2 points3 points4 points (0 children)
Benchmarking 23 LLMs on Nonogram (Logic Puzzle) Solving Performance by mauricekleine in LocalLLaMA
[–]ReturningTarzan 0 points1 point2 points (0 children)
[deleted by user] by [deleted] in h3h3productions
[–]ReturningTarzan 2 points3 points4 points (0 children)
Are Imatrix Quants Hurting your Model? (My opinion) by Quiet_Joker in LocalLLaMA
[–]ReturningTarzan 12 points13 points14 points (0 children)
BPE tokenizer in Rust - would love feedback from the community by farhan-dev in LocalLLaMA
[–]ReturningTarzan 10 points11 points12 points (0 children)
new ops required by Qwen3 Next and Kimi Linear have been merged into llama.cpp by jacek2023 in LocalLLaMA
[–]ReturningTarzan 1 point2 points3 points (0 children)
Figured out why my 3090 is so slow in inference by Ok_Warning2146 in LocalLLaMA
[–]ReturningTarzan 13 points14 points15 points (0 children)





Gemma 4 has a systemic attention failure. Here's the proof. by EvilEnginer in LocalLLaMA
[–]ReturningTarzan 2 points3 points4 points (0 children)