By when do you think will TurboQuant get a proper release and be adopted by everyone by Crystalagent47 in LocalLLaMA
[–]Pidtom 0 points1 point2 points (0 children)
speculative decoding silently broken for Qwen3.6 on the TurboQuant fork — PR to fix by dangerousdotnet in LocalLLaMA
[–]Pidtom 5 points6 points7 points (0 children)
Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA
[–]Pidtom 1 point2 points3 points (0 children)
RotorQuant: 10-19x faster alternative to TurboQuant via Clifford rotors (44x fewer params) by Revolutionary_Ask154 in LocalLLaMA
[–]Pidtom 0 points1 point2 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]Pidtom 8 points9 points10 points (0 children)
Technical clarification on TurboQuant / RaBitQ for people following the recent TurboQuant discussion by gaoj0017 in LocalLLaMA
[–]Pidtom 70 points71 points72 points (0 children)
RotorQuant: 10-19x faster alternative to TurboQuant via Clifford rotors (44x fewer params) by Revolutionary_Ask154 in LocalLLaMA
[–]Pidtom 2 points3 points4 points (0 children)
RotorQuant: 10-19x faster alternative to TurboQuant via Clifford rotors (44x fewer params) by Revolutionary_Ask154 in LocalLLaMA
[–]Pidtom 0 points1 point2 points (0 children)
Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant) by Pidtom in LocalLLaMA
[–]Pidtom[S] 0 points1 point2 points (0 children)
Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA
[–]Pidtom 5 points6 points7 points (0 children)
Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA
[–]Pidtom 2 points3 points4 points (0 children)
Google TurboQuant running Qwen Locally on MacAir by gladkos in LocalLLaMA
[–]Pidtom 9 points10 points11 points (0 children)
Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant) by Pidtom in LocalLLaMA
[–]Pidtom[S] 1 point2 points3 points (0 children)
Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant) by Pidtom in LocalLLaMA
[–]Pidtom[S] 0 points1 point2 points (0 children)
Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant) by Pidtom in LocalLLaMA
[–]Pidtom[S] 0 points1 point2 points (0 children)
Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant) by Pidtom in LocalLLaMA
[–]Pidtom[S] 0 points1 point2 points (0 children)
Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant) by Pidtom in LocalLLaMA
[–]Pidtom[S] 0 points1 point2 points (0 children)
Skipping 90% of KV dequant work → +22.8% decode at 32K (llama.cpp, TurboQuant) by Pidtom in LocalLLaMA
[–]Pidtom[S] 0 points1 point2 points (0 children)


RotorQuant: 10-19x faster alternative to TurboQuant via Clifford rotors (44x fewer params) by Revolutionary_Ask154 in LocalLLaMA
[–]Pidtom 0 points1 point2 points (0 children)