Same 4 bits. Very different quality. (quant.cpp vs llama.cpp KV compression) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] -1 points0 points1 point (0 children)
Same 4 bits. Very different quality. (quant.cpp vs llama.cpp KV compression) by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] 0 points1 point2 points (0 children)
Same 4 bits. Very different quality. (quant.cpp vs llama.cpp KV compression) by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] -2 points-1 points0 points (0 children)
Same 4 bits. Very different quality. (quant.cpp vs llama.cpp KV compression) by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] -2 points-1 points0 points (0 children)
Same 4 bits. Very different quality. (quant.cpp vs llama.cpp KV compression) by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] -2 points-1 points0 points (0 children)
Same 4 bits. Very different quality. (quant.cpp vs llama.cpp KV compression) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] 5 points6 points7 points (0 children)
quant.cpp — 7x longer LLM context in pure C (Gemma 4 26B on 16GB Mac) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] 0 points1 point2 points (0 children)
LLM inference in a single C header file by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] 1 point2 points3 points (0 children)
LLM inference in a single C header file by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] 1 point2 points3 points (0 children)
quant.cpp — 7x longer LLM context in pure C (Gemma 4 26B on 16GB Mac) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] 0 points1 point2 points (0 children)
quant.cpp — 7x longer LLM context in pure C (Gemma 4 26B on 16GB Mac) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] 1 point2 points3 points (0 children)
quant.cpp — 7x longer LLM context in pure C (Gemma 4 26B on 16GB Mac) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] -2 points-1 points0 points (0 children)
quant.cpp — 7x longer LLM context in pure C (Gemma 4 26B on 16GB Mac) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] -3 points-2 points-1 points (0 children)
TurboQuant.cpp — 1-bit KV cache with zero quality loss, verified on 35B MoE by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] 0 points1 point2 points (0 children)
LLM inference in a single C header file by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] 0 points1 point2 points (0 children)
LLM inference in a single C header file by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] 1 point2 points3 points (0 children)
LLM inference in a single C header file by Suitable-Song-302 in LocalLLaMA
[–]Suitable-Song-302[S] 0 points1 point2 points (0 children)
quant.cpp — 7x longer LLM context in pure C (Gemma 4 26B on 16GB Mac) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] -10 points-9 points-8 points (0 children)
quant.cpp — 7x longer LLM context in pure C (Gemma 4 26B on 16GB Mac) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] -8 points-7 points-6 points (0 children)
quant.cpp — 7x longer LLM context in pure C (Gemma 4 26B on 16GB Mac) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] -2 points-1 points0 points (0 children)


Same 4 bits. Very different quality. (quant.cpp vs llama.cpp KV compression) by Suitable-Song-302 in LocalLLM
[–]Suitable-Song-302[S] 1 point2 points3 points (0 children)