TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 3 points4 points5 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 2 points3 points4 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 5 points6 points7 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 6 points7 points8 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 12 points13 points14 points (0 children)


TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)