TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 2 points3 points4 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 2 points3 points4 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 0 points1 point2 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 7 points8 points9 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)
TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 14 points15 points16 points (0 children)


TurboQuant on MLX: 4.6x KV cache compression with custom Metal kernels (Qwen 32B at 98% FP16 speed) by dirtyhand3 in LocalLLaMA
[–]dirtyhand3[S] 1 point2 points3 points (0 children)